networkx / nx-parallel

A networkx backend that uses joblib to run graph algorithms in parallel.
BSD 3-Clause "New" or "Revised" License
23 stars 20 forks source link

ENH : adding parallel implementations of `all_pairs_` algos #33

Closed Schefflera-Arboricola closed 5 months ago

Schefflera-Arboricola commented 7 months ago

Please do "Squash and merge" while merging this PR

Added the following :



[1] here, default chunking means having chunks of at most 10 nodes, because otherwise, it was taking too long with bigger node_chunks; and I also reduced the number of nodes while timing, but the speedups are pretty consistent(i.e. around 3). This heatmap alone took 2.5 hrs, and ran successfully after several tries. Also no heatmap for approximate_all_pairs_node_connectivity because for a random graph, it will result in the same heatmap as all_pairs_node_connectivity

Schefflera-Arboricola commented 6 months ago

Tests / test (macos-latest, 3.11) is running for more than 4 hrs. It is stuck at:

algorithms/centrality/tests/test_group.py ........................       [ 50%]
dschult commented 6 months ago

That happened to me yesterday too. I don’ think it has anything to do with the backend, but I’m not sure.

dschult commented 6 months ago

Where does this PR stand? Is there more to do here? Do you feel the improvements here are what you expect -- or are there other questions/issues you want to explore with this PR?

Schefflera-Arboricola commented 5 months ago

I think we can merge this. The only algo not showing any speedups is all_pairs_shortest_path_length and its sequential implementation is already really fast. But, we should think about how to handle such functions in the whole of nx-parallel, i.e the functions that don’t show any speedups for the general random graphs we use in the current timing script and/or whose sequential networkx implementation is already really fast(like number_of_isolates, all_pairs_shortest_path_length, global_reaching_centrality and local_reaching_centrality(https://github.com/networkx/nx-parallel/pull/44), etc.). For some such algos setting the minimum chunk size to be really large might help(this didn't work for all_pairs_shortest_path_length), but then the input graph should also be really big. Just a thought, but having something like should_run = True only when the graph is really big might be helpful.