Open bwdeng20 opened 3 years ago
That's an interesting issue. We mostly follow the official paper, which introduces the approach using a single embedding layer f
. I personally think two embedding layers do not make sense for Node2Vec, because I do not see a reason why center and context nodes should be treated differently on graphs.
I absolutely agree with your comments about the layer number. It was just by chance for me to find this confusing difference and later I found that there are a lot of python
implementaions of node2vec
and deepwalk
directly invoking gensim.models.word2vec
, which has, as far as I know, two embedding layers. Probably the C
implementation of node2vec
there just use one embedding layer, with following the paper.
It‘s indeed really confusing. I think we can make this behaviour optional. WDYT?
That sounds good to me. I'am going to reimplement some relevant models in the near future. Perhaps I can send a pull request once I finish it
Sounds perfect :)
Do you know how to check if one value in a cpp tensor? For instance, checking whether 1
is in torch.LongTensor([3,1,2])
in python can be as easy as follows.
a= torch.LongTensor([3,1,2])
flag= 1 in a
Is there any elegant method to do so with cpp apis of torch?
The workaround would be as follows:
(1 == a).sum() > 0
thx for a so quick reply😂
Sorry for a late response. I was up to other stuffs during the last month.
I've implemented a cpp
alias sampling
suite for cpu torch.Tensor
, which includes
aliastable.cpu
: produces probability table
and alias table
in terms of the input probablity.aliasdraw.cpu
: sample a node at $O(1)$ time complexity according to probability table
and alias table
.n2vwgt.cpp
: compute the edge weight modified by p and q
. In particular, Given one source
node, one of it's nbrs is refered as current
node, we compute the modified weights for the neighbors of current
node. The present implementation of node2vec
generates training samples on the fly with a uniform random walking, wihch is not compatible with alias sampling
. In my opinion, there are three choices:
keep the completely on the fly
style of generating samples, and add a p,q
random walking in torch_cluster
such that it can compute edge weights with p, q
and walk accordingly. Sounds very expensive and may not easy implemented in CUDA
.
precompute the modified weights and add a random walking method based on them. This requires a nested data structure like this:
{current1: {source1 : ModifiedWeights1, source2: ModifiedWeights2 }, current2: {...} }
but making decsion of next step is much more slower than alias
sampling. For instance, given a modified weight (also possibilities) [0.2, 0.4, 0.1, 0.3]
, we may have to generate a random number from U[0,1] and compares it with the accumulated possiblities.
precompute the modified weights and corresponding alias tables and add aenere random walk method utlizing alias samping
for choosing node.
This is also correlated with this issue #957
❓ Questions & Help
I've found only one
torch.nn.Embedding
layer intorch_geometric.nn.models.node2vec
, which is not consistent with the offcial previous implementation based onword2vec
model consisting of two embedding layers.Is the model provided by
pyg
just a simplified version ? Or someone has found thepyg
versionnode2vec
can achieve the similar performance with the official one?