Open daxpryce opened 4 years ago
hey @daxpryce, is this up for grabs? I've been working on various embedding methods recetnly, i.e. comparing ASE vs node2vec, so the difference in syntax is pretty annoying, happy to help with PR if needed.
I think you absolutely can take this; there's some work coming down the pipeline that'll be impacted by whatever you do, but I don't know how far along it is or when we might expect it and I don't want to slow you down on this front. I definitely understand the ergonomics are jarring.
If you could move the current n2v function signature into the graspologic.pipeline.embed
module and then work on a sklearn style impl in graspologic.embed
, that would be great.
Is your feature request related to a problem? Please describe.
n2v currently behaves very differently from the rest of the embed module
Describe the solution you'd like
While it is true that discrete
fit
andtransform
methods don't really make much sense for node2vec, it can at least implementfit_transform
and warn or throw an error iffit
ortransform
are called separately.It also requires a networkx Graph, whereas we should be able to handle adjacency matrices in numpy and scipy sparse csr formats as well.
Lastly, none of the other embed "functions" (fit_transform results) return the labels array. It is going to be a requirement that our labels and our graphs match up at all times in all scenarios, up to and including nodes without any edges - that way we can rely on the natural ordering of graph to provide this label lookup vs. an explicitly returned tuple. The alternative makes for an even worse API from the adjacency matrix perspective, so this route is being chosen instead.
Additional context
The rest of the embed module is going to have corresponding functions that encapsulate the
create AdjacencySpectralEmbed object with appropriate parameters, then call fit_transform on this graph
steps into a single function, which will be more similar to the currentnode2vec_embed
function we have.