snap-stanford / snap

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
Other
2.17k stars 797 forks source link

High memory consumption when using 'node2vec'. #140

Closed wangjwchn closed 5 years ago

wangjwchn commented 6 years ago

When I run node2vec on a directed graph with 1 million edges, it used about 40GB memory. I want to run it on a larger graph, but i'm out of memory. Is there any way to reduce memory consumption (such as batch processing)?

minhlab commented 6 years ago

I'm thinking we could try to use an unweighted graph similar to the random walk example. It takes substantially less time to load and consumes 1/10 of RAM.

RemyLau commented 6 years ago

@wangjwchn It is (I think) because it stores not only the transitional probabilities for every single edges (which they called "alias_nodes"), but also transitional probabilities for for every "next" edges for every single edge (the so called "alias_edges"). So if you have 1M of edges, you will end up with roughly (1M)^2 transition probabilities. I'm running into the same problem here. I was thinking of calculating transition probabilities on the fly as it simulating walks. But it probably to generate a lot of unnecessary calculations. Or maybe with some kind of caching to store to most frequently used alias_edges?

celia01 commented 5 years ago

I meet the same problem with you. I also have two graph. The smaller graph is: 142,000 Nodes 27,000,000 Edges It's also used more than 100G memory. This node2vec tools can’t support my lager graph, too.

RemyLau commented 5 years ago

@celia01 Hi, so it turns out that most of the overheads are from networkx, and node2vec trying to cache all the transition probabilities. I customized the implementation a little bit so that it computes the trans prob On The Fly. If you're interested in it, feel free to check it out! https://github.com/RemyLau/GraphUtil I've done a couple benchmarking and the performance is pretty consistent with the original implementation. Just a side note, I had implemented multiple customization mode: FAST, OTF, NPOTF. OTF is the one you can try first, just specify --custom OTF when you call the n2v.py script

roks commented 5 years ago

node2vec requires significant amount of memory for graph of your size. Check out http://snap.stanford.edu/graphsage/ for a less memory demanding solution.

wangjwchn commented 5 years ago

@roks Cool, I'll try this one. 👍