Closed wangjwchn closed 5 years ago
I'm thinking we could try to use an unweighted graph similar to the random walk example. It takes substantially less time to load and consumes 1/10 of RAM.
@wangjwchn It is (I think) because it stores not only the transitional probabilities for every single edges (which they called "alias_nodes"), but also transitional probabilities for for every "next" edges for every single edge (the so called "alias_edges"). So if you have 1M of edges, you will end up with roughly (1M)^2 transition probabilities. I'm running into the same problem here. I was thinking of calculating transition probabilities on the fly as it simulating walks. But it probably to generate a lot of unnecessary calculations. Or maybe with some kind of caching to store to most frequently used alias_edges?
I meet the same problem with you. I also have two graph. The smaller graph is: 142,000 Nodes 27,000,000 Edges It's also used more than 100G memory. This node2vec tools can’t support my lager graph, too.
@celia01 Hi, so it turns out that most of the overheads are from networkx, and node2vec trying to cache all the transition probabilities. I customized the implementation a little bit so that it computes the trans prob On The Fly. If you're interested in it, feel free to check it out! https://github.com/RemyLau/GraphUtil I've done a couple benchmarking and the performance is pretty consistent with the original implementation. Just a side note, I had implemented multiple customization mode: FAST, OTF, NPOTF. OTF is the one you can try first, just specify --custom OTF when you call the n2v.py script
node2vec requires significant amount of memory for graph of your size. Check out http://snap.stanford.edu/graphsage/ for a less memory demanding solution.
@roks Cool, I'll try this one. 👍
When I run
node2vec
on a directed graph with 1 million edges, it used about40GB
memory. I want to run it on a larger graph, but i'm out of memory. Is there any way to reduce memory consumption (such as batch processing)?