phanein / deepwalk

DeepWalk - Deep Learning for Graphs
http://www.perozzi.net/projects/deepwalk/
Other
2.68k stars 826 forks source link

Make process pool optional for loading adjacency lists #69

Closed cthoyt closed 6 years ago

cthoyt commented 6 years ago

References #68

ozlemmuslu commented 6 years ago

It works faster for smaller graphs.

>>> start = time.time(); G1=graph.load_adjacencylist("example_graphs/karate.adjlist", use_multiprocessing=True); print(time.time() - start)
0.05406975746154785
>>> start = time.time(); G=graph.load_adjacencylist("example_graphs/karate.adjlist", use_multiprocessing=False); print(time.time() - start)
0.025623321533203125

For larger graphs, using multiprocessing is better.

>>> start = time.time(); G=graph.load_adjacencylist("../GAT2VEC/data/blogcatalog/blogcatalog_graph.adjlist", use_multiprocessing=True); print(time.time() - start)
1.0107381343841553
>>> start = time.time(); G=graph.load_adjacencylist("../GAT2VEC/data/blogcatalog/blogcatalog_graph.adjlist"); print(time.time() - start)
1.0568852424621582

In both cases, the difference is a fraction of a second

GTmac commented 6 years ago

Thanks for the pull request! This looks good and I will take a look over this weekend :) I have one question: have you tried even larger graphs such as the Youtube graph (http://socialcomputing.asu.edu/datasets/YouTube2) and how large is the difference in running time?

ozlemmuslu commented 6 years ago

I ran it with Youtube graph too:

start = time.time(); G1=graph.load_adjacencylist("example_graphs/youtube.adjlist", use_multiprocessing=True); print(time.time() - start)
3.9949445724487305
start = time.time(); G1=graph.load_adjacencylist("example_graphs/youtube.adjlist", use_multiprocessing=False); print(time.time() - start)
4.566006898880005

So, there is a time difference of ~0.6 seconds

cthoyt commented 6 years ago

@GTmac unrelated, do you know Tim Barron? He's also doing his computer science PhD at Stony Brook in cybersecurity

GTmac commented 6 years ago

@cthoyt Sorry but I don't know him :)

GTmac commented 6 years ago

Given the overhead introduced is small I am going to merge this. Thanks for the effort!