xiangyue9607 / BioNEV

Graph Embedding Evaluation / Code and Datasets for "Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations" (Bioinformatics 2020)
MIT License
224 stars 77 forks source link

node2vec is killed in evaluation phase #9

Closed akastrin closed 5 years ago

akastrin commented 5 years ago

Hi,

First thanks for a great paper. However, when I try to reproduce your results, the node2vec method is suddenly killed in the evaluation procedure.

I used the following line to start with node2vec embedding: bionev --input ./data/Clin_Term_COOC/Clin_Term_COOC.edgelist --output ./embeddings/node2vec.txt --method node2vec --task link-prediction --eval-result-file eval_results2.txt --weighted True

The output lines:

######################################################################
Embedding Method: node2vec, Evaluation Task: link-prediction
######################################################################
Original Graph: nodes: 48651 edges: 1659249
Training Graph: nodes: 48651 edges: 1328307
Loading training graph for learning embedding...
Graph Loaded...
Preprocess transition probs...
Begin random walk...
Walk finished...
Learning representation...
Saving embeddings...
Embedding Learning Time: 10034.56 s
Nodes with embedding: 48651
Begin evaluation...
Killed

I have 256GB memory on my server so I suspect that RAM is not in-game. When I try with a smaller dataset, the evaluation phase ended successfully.

Any idea what do do?

Best, Andrej

xiangyue9607 commented 5 years ago

Hi Andrej,

Thanks for your interest! I'm not sure the reason. But I guess it's still the memory problem. In our experiment setting, we didn't do link prediction in the Clini_Term_COOC dataset. But as you can see, the scale of this dataset is very large. 1,659,249 edges in total and 1,328,307 in training. So there would be roughly (1,659,249-1,328,307)*2=661,884 edges for testing. It is very likely to consume more than 256G mem. You could also check your sys log to see what happened. But I guess it's probably caused by the out-of-mem issue.

Thanks:)

akastrin commented 5 years ago

You are right. The node classification task works like a charm.