graspologic-org / graspologic

Python package for graph statistics
https://graspologic-org.github.io/graspologic/
MIT License
615 stars 132 forks source link

Node2Vec Enhancements #566

Open daxpryce opened 3 years ago

daxpryce commented 3 years ago

Is your feature request related to a problem? Please describe.

n2v currently behaves very differently from the rest of the embed module

Describe the solution you'd like

While it is true that discrete fit and transform methods don't really make much sense for node2vec, it can at least implement fit_transform and warn or throw an error if fit or transform are called separately.

It also requires a networkx Graph, whereas we should be able to handle adjacency matrices in numpy and scipy sparse csr formats as well.

Lastly, none of the other embed "functions" (fit_transform results) return the labels array. It is going to be a requirement that our labels and our graphs match up at all times in all scenarios, up to and including nodes without any edges - that way we can rely on the natural ordering of graph to provide this label lookup vs. an explicitly returned tuple. The alternative makes for an even worse API from the adjacency matrix perspective, so this route is being chosen instead.

Additional context

The rest of the embed module is going to have corresponding functions that encapsulate the create AdjacencySpectralEmbed object with appropriate parameters, then call fit_transform on this graph steps into a single function, which will be more similar to the current node2vec_embed function we have.

dokato commented 2 years ago

hey @daxpryce, is this up for grabs? I've been working on various embedding methods recetnly, i.e. comparing ASE vs node2vec, so the difference in syntax is pretty annoying, happy to help with PR if needed.

daxpryce commented 2 years ago

I think you absolutely can take this; there's some work coming down the pipeline that'll be impacted by whatever you do, but I don't know how far along it is or when we might expect it and I don't want to slow you down on this front. I definitely understand the ergonomics are jarring.

If you could move the current n2v function signature into the graspologic.pipeline.embed module and then work on a sklearn style impl in graspologic.embed, that would be great.