snap-stanford / snap

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
Other
2.2k stars 800 forks source link

node2vec performance is much worse than python reference #107

Open xgfs opened 7 years ago

xgfs commented 7 years ago

Hello there,

I have discovered that with the same default parameters on Blogcatalog graph (d=128, len=80, n_walks=10, p=0.25, q=0.25) C++ version produces much worse results:

C++ 50% micro: 37.85 macro: 20.72 Python 50% micro: 40.11 macro: 26.90

Is there a parameter I am missing?

roks commented 7 years ago

We also noticed that the C++ version of word2vec can produce different results in some cases. Our tests show that the difference is normally small and not as large as in your example. We are planning to investigate this issue further.

A quick solution is to print the generated random walks and use some other implementation of word2vec to train the embeddings.

Zhang-THU commented 7 years ago

Similar issue is observed for link prediction task in BlogCatalog. Seems that the problem is more severe with more threads (i.e. I run C++ version in a 24 cores machine, it produces much worse embedding than in a 8 cores machine). Maybe check the parallel part?