Open deweihu96 opened 1 year ago
@deweihu96 Thank you for your interest in our work.
It seems the two variables you mentioned are conceptually the same (i.e. how much of surrounding words/nodes to use) but just different in terms of package implementation.
In gensim, there is a multiplication by 2 probably because the implementation directly extracts words before and after the target word.
On the other hand, in our work, which follows the pytorch-geometric package's node2vec implementation, each random walk is splitted into subsequences of length = _contextsize, and then for each subsequence, the initial node pairs up with all the remaining nodes to form a positive example.
Hi, thanks for your great work. I've been trying to incorporate the augmentation code into my own workflow.
I'm a little bit confused by the "context_size" variable. Is it the same as the "window_size" variable in gensim word2vec?
The actual sequences in gensim word2vec used for training is 2*window_size - 1.
But the code in sampler.argew suggests that the length of sequences used for training is "context_size" : sequences = rw[:, j:j + self.context_size].