xgfs / node2vec-c

node2vec implementation in C++
MIT License
50 stars 9 forks source link

Early Termination of training:Progress 19.42% #2

Open qxde01 opened 5 years ago

qxde01 commented 5 years ago

nv: 1032440, ne: 987454 Need 0.257797 Mb for storing second-order degrees

Generating a corpus for negative samples..

Using vectorized operations lr 0.020144, Progress 19.42% Calculations took 12.13 s to run

xgfs commented 5 years ago

There is definetely a couple of weird things here: number of nodes is larger than the number of edges, and the graph seems almost disconnected. Are you sure you have the correct data preprocessing/data format?

qxde01 commented 5 years ago

There are thousands of subgraphs, and no connection between them. data preprocessing use https://github.com/xgfs/verse/tree/master/python ,and verse can train correctly. My computer has 24cpu and 256G memory. Thanks.

xgfs commented 5 years ago

If most of the nodes lie in small disconnected subgraphs, node2vec will be much faster (with unknown quality). From the output I would assume that process is ran correctly, and it finishes - most likely, the printing of the progress is not called (it's only called from thread id=0).

qxde01 commented 5 years ago

I found different places: If I use the parameter -nwalks 80, the training will be interrupted: ./node2vec -input rela.bscr -output embedding.bin -dim 256 -nwalks 80 and use default nwalks ,training is correct : ./node2vec -input rela.bscr -output embedding.bin -dim 256

xgfs commented 5 years ago

Interesting. Could you run the default couple of times and see if it crashes anytime?

qxde01 commented 5 years ago

if nwalks < 40 , the training will not be interrupted.

xgfs commented 5 years ago

If you post the graph file, I will try to look at the issue some time in the future. If you find a bug yourself, please notify/submit a PR.