Closed jovo closed 3 years ago
@hhelm10 can you add a link to the associated paper for this?
Id like to work on this!
Also interested!
Also interested!
@hhelm10 think you could give us a quick rundown of spectral (single graph) VN? My original thought was a simple interface that eats a graph or embedding and does nearest neighbor queries... but I see now that there are more complicated schemes even before the ILP stuff.
sure --
will give a short and sweet description of a set of methods..
i) start with an n by n graph G and a vertex of interest v;. ii) pick your favorite embedding method to get n points in R^{d} (each d-dimensional vector is to be interpreted as a representation of a vertex in G); iii) then, depending on the embedding method you use, pick a distance / dissimilarity on R^{d} (for instance, if i suspect my graph is generated from an SBM and i embed the graph using ASE / LSE i may choose either Euclidean distance for simplicity or Mahalanobis distance to exploit the assumed structure of the point in R^{d}); iv) calculate the distances between the vector corresponding to v and vectors corresponding to the other n - 1 vertices; v) return the closest k vertices to v* says your selected distance;
i can see an interface where you initialize the object with an embedding method and a distance, the "fit" method taking G, and the "predict" method taking v* and (optionally) k.
does that make sense?
Thanks @hhelm10 , that's very helpful. I've been reading some papers on this, and have been a bit confused by the wide variety of similar, but not identical, methodologies.
yea -- a lot of the methodologies that people come up w are pretty application / domain specific and so there are small tweaks here and there that try to optimize performance for the particular class of graph that they have access to (i.e. use LSE if graph is particularly sparse, somehow incorporate node / edge attributes, etc.). but for graphs larger than 50 nodes the typical procedure is to embed and apply / learn a distance to then rank the rest of the nodes.
a small aside: some of our recent work tries to avoid having to select an embedding method / distance by combining a bunch of them using supervisory information.
Here would be my first draft of a VertexRanker class. Note that embedding_method is an instantiated embedding method and that pairwise_distance is not defined:
class VertexRanker:
def __init__(embedding_method=ASE,
distance='Euclidean',
distance_kwargs = None):
self.embedding_method = embedding_method
self.distance = distance
self.distance_kwargs = distance_kwargs
def fit(G):
X_hat = self.embedding_method.fit_transform(G)
self.dist_matrix = pairwise_distance(X_hat, distance=self.distance, distance_kwargs=self.distance_kwargs)
self.fitted = True
def predict(index, k=1):
return np.argsort(self.dist_matrix[index])[1:k+1]
@bdpedigo then to include something like the ILP you could let embedding_method, distance, distance_kwargs
be a dictionaries and just make the fit function a little more complicated
DoD for this issue:
Minimum Features
Checks
Ideal Features Planning to do pretty much all of these, but goes beyond the basic functionality
can take an adj. matrix
maybe also an embedding
for each class of interest ordered by probability.
You may have already read up on VN more than me, which is great - I get the "class nomination" idea but I believe there should also be a completely unsupervised notion (see @hhelm10's example implementation, which has no notion of class)
should run very quickly, even for graphs with millions of vertices.
Would be awesome, but first goal is to just get something that works. also "very quickly" is vague
Should have the option to choose different distance metrics to be used in clustering
For something like GMM and maybe even k-means just note that this is a non-trivial problem.
Should be able to optionally infer K (number of blocks)
also non-trivial problem
other notes:
I think this is the oldest/simplest paper/alg: https://arxiv.org/pdf/1201.4118.pdf
https://github.com/neurodata/primitives-interfaces/tree/master/jhu_primitives/sgvn