sknetwork-team / scikit-network

Graph Algorithms
Other
602 stars 67 forks source link

Feature Request GNN for KNN #549

Closed ChrisDelClea closed 1 year ago

ChrisDelClea commented 1 year ago

Description

Hi guys, thanks for this nice package. I am currently working on Embeddings customers and have to find the most similar ones. For that, I need an inductive setup (actually mixed: known relations and entities with new entities that might attach to the graph). So I found your graphSage GNNClassifier that is doing similar work, however rather to classify. My question is, could you kinda abstract that Classifier so I could call gnn.prdict(unseen_node) and receive most similar nodes? Similar to scikit K-nearest-neigbhors without Classification and Regression.

Best regards Chris

tbonald commented 1 year ago

Good suggestion, thanks! We'll work on that and come back to you. Best, Thomas

lucasmccabe commented 1 year ago

Hello - is anyone tackling this? If not, I am open to taking a stab at it.

tbonald commented 1 year ago

Not yet. You're welcome, thanks!

ChrisDelClea commented 1 year ago

Any updates on it?

lucasmccabe commented 1 year ago

My apologies - I ended up not having time to get to this yet. Feel free to take over the issue if you have the availability!

tbonald commented 1 year ago

I had a look. Here is a solution, using cosine similarity in the embedding space.

import numpy as np
from scipy import sparse

from sknetwork.data import art_philo_science
from sknetwork.gnn import GNNClassifier
from sknetwork.linalg import normalize

graph = art_philo_science(metadata=True)
adjacency = graph.adjacency
features = graph.biadjacency
labels = graph.labels
names = graph.names

gnn = GNNClassifier([5, 3])

# training
gnn.fit(adjacency, features, labels)

# new node (here taken as node 0)
adjacency_new = sparse.vstack((adjacency[0], adjacency))
features_new = sparse.vstack((features[0], features))

# make adjacency square
n = len(labels)
column_zeros = sparse.csr_matrix((n + 1, 1))
adjacency_new = sparse.hstack((column_zeros, adjacency_new))

# predict
_ = gnn.predict(adjacency_new, features_new)

# embedding
embedding = gnn.layers[-1].embedding

# cosine similarity
X = normalize(embedding, p=2)
sim = X[1:].dot(X[0])

# nearest neighbors
nearest_neighbors = np.argsort(-sim)
print(names[0])
print(names[nearest_neighbors[:5]])

> Isaac Newton
['Isaac Newton' 'Aristotle' 'John von Neumann' 'Albert Einstein'
 'David Hume']

I hope it helps.

tbonald commented 1 year ago

I'm closing this as it is not an issue per se. If you have any further comments or questions, please open a "Discussion".