Open xgfs opened 7 years ago
We also noticed that the C++ version of word2vec can produce different results in some cases. Our tests show that the difference is normally small and not as large as in your example. We are planning to investigate this issue further.
A quick solution is to print the generated random walks and use some other implementation of word2vec to train the embeddings.
Similar issue is observed for link prediction task in BlogCatalog. Seems that the problem is more severe with more threads (i.e. I run C++ version in a 24 cores machine, it produces much worse embedding than in a 8 cores machine). Maybe check the parallel part?
Hello there,
I have discovered that with the same default parameters on Blogcatalog graph (d=128, len=80, n_walks=10, p=0.25, q=0.25) C++ version produces much worse results:
Is there a parameter I am missing?