[GraphSAGE] Positive and negative node pairs

williamleif / GraphSAGE

Representation learning on large graphs using stochastic graph convolutions.

Other

3.43k stars 844 forks source link

[GraphSAGE] Positive and negative node pairs #165

Open Oolev opened 3 years ago

Oolev commented 3 years ago

Forgive my lack of knowledge, but I have a simple question regarding the provided demo for node representation learning with GraphSAGE and the use of an unsupervised sampler:

Given a large set of positive (+) node pairs (generated from random walks performed on the graph), and an equally large set of negative (-) pairs that are randomly selected from the graph

What do you mean by positive and negative node pairs? Could you point me to some useful resources to better understand this concept?

a-little-srdjan commented 3 years ago

Hello, I haven't used GraphSAGE but do follow the project. My understanding, based on the published papers, is as follows. Consider a node A that you are preparing the training samples for. Then a positive pair (A, B) is a pair where B is actually similar to A in some semantic sense: E.g. it is within n-steps from A, it shares some structural similarities with A (eg struct2vec). While a negative pair (A, C) is a pair where C is not similar to A, e.g. you randomly pick from the sub-graph that is over n steps removed from A.

The above approach was first used in training word2vec models (building embeddings for words.)

Hope the above helps.

Oolev commented 3 years ago

Hello, I haven't used GraphSAGE but do follow the project. My understanding, based on the published papers, is as follows. Consider a node A that you are preparing the training samples for. Then a positive pair (A, B) is a pair where B is actually similar to A in some semantic sense: E.g. it is within n-steps from A, it shares some structural similarities with A (eg struct2vec). While a negative pair (A, C) is a pair where C is not similar to A, e.g. you randomly pick from the sub-graph that is over n steps removed from A.

The above approach was first used in training word2vec models (building embeddings for words.)

Hope the above helps.

Thanks a lot! Much appreciated. It's clear now.