ncsoft / argew

Implementation for "Node Embedding for Homophilous Graphs with ARGEW: Augmentation of Random walks by Graph Edge Weights"
Other
4 stars 0 forks source link

context size in sampling #1

Open deweihu96 opened 1 year ago

deweihu96 commented 1 year ago

Hi, thanks for your great work. I've been trying to incorporate the augmentation code into my own workflow.

I'm a little bit confused by the "context_size" variable. Is it the same as the "window_size" variable in gensim word2vec?

The actual sequences in gensim word2vec used for training is 2*window_size - 1.

But the code in sampler.argew suggests that the length of sequences used for training is "context_size" : sequences = rw[:, j:j + self.context_size].

danieljunhee commented 1 year ago

@deweihu96 Thank you for your interest in our work.

It seems the two variables you mentioned are conceptually the same (i.e. how much of surrounding words/nodes to use) but just different in terms of package implementation.

In gensim, there is a multiplication by 2 probably because the implementation directly extracts words before and after the target word.

On the other hand, in our work, which follows the pytorch-geometric package's node2vec implementation, each random walk is splitted into subsequences of length = _contextsize, and then for each subsequence, the initial node pairs up with all the remaining nodes to form a positive example.