ENH : Added `get_chunks` to `betweenness_centrality` and `create_iterables` in `utils`

networkx / nx-parallel

A networkx backend that uses joblib to run graph algorithms in parallel.

BSD 3-Clause "New" or "Revised" License

23 stars 20 forks source link

ENH : Added `get_chunks` to `betweenness_centrality` and `create_iterables` in `utils` #29

Closed Schefflera-Arboricola closed 5 months ago

Schefflera-Arboricola commented 8 months ago

Please do "Squash and merge" while merging this PR

Added get_chunks kwarg to betweenness_centrality :
- description: A user-defined function to create node_chunks(instead of how we currently do it i.e. by creating partitions on the G.nodes; if get_chunks is "chunks"(default) then this is executed). To understand the motivation for this see the additional pytest added
- also added additional tests(specific to nx-parallel) to test additional kwargs
added create_iterables(ref. Pr #7)

dschult commented 6 months ago

This PR has style issues. Next time you get to it, I think it is just one line with a space at the end of the line.

dschult commented 6 months ago

It might be assuming too much to assert that the time should be less when chunking is used. Of course, asserting that could allow you to track down why it is slower, I suppose.

But should it always be slower? How might get_chunks slow it down?

Schefflera-Arboricola commented 6 months ago

It might be assuming too much to assert that the time should be less when chunking is used. Of course, asserting that could allow you to track down why it is slower, I suppose.

But should it always be slower? How might get_chunks slow it down?

here, I have compared the time for running the parallel implementation of betweenness_centrality with the default chunking and with a custom chunking(done based on the edge_betweenness_centrality values). And I locally ran it on 8 cores and in github workflow test it runs on 2 cores, and for some machines, this test is failing and for some it is passing.

This could be a nice example of why get_chunk might be useful as an option, that would let a user do custom chunking when they have a really big graph. But, I don't think we should put it as a test because it takes a lot of time and it also doesn't pass for all machines.

dschult commented 6 months ago

That makes good sense to me. Timing is dependent on so many things... Hard to do well... Nice to have parameters to tweak for each situation and get_chunk is a good one to make available. :)

Schefflera-Arboricola commented 5 months ago

Are there any more changes planned for this PR? No, I think it's ready

Thanks @dschult !