nx-cugraph: faster `shortest_path`

eriknw commented 4 weeks ago

For larger graphs, nearly all the time was spent creating the dict of lists of paths. I couldn't find a better way to create these, nor could I find an approach to compute more in PLC or cupy. So, the solution in this PR is to avoid computing until needed!

This now returns a Mapping instead of a dict. Will anybody care or notice that the return type isn't strictly a dict?

~This is currently only for unweighted bfs. We should also do weighted sssp paths.~ Update to do sssp too.

~Also, this currently recurses. If we like the approach in this PR, we should update computing the paths on demand to not recurse.~ Updated to not recurse to avoid any chance of hitting recursion limit, which would have been unlikely, but possible.

If all paths are used--and, hence, computed in PathMapping--then overall performance remains comparable. Hence, this PR speeds up performance (sometimes by a lot!) or keeps it the same, and the cost is a non-dict return types, which I think is okay for backends to do, because duck-typing is a thing in Python.

ChuckHastings commented 4 weeks ago

If you can define the behavior you want, we could explore adding some PLC functionality to improve this.

eriknw commented 4 weeks ago

If you can define the behavior you want, we could explore adding some PLC functionality to improve this.

Thanks @ChuckHastings, this is what I tried to do initially, but I couldn't determine anything that would actually help, because the slow part is creating Python lists for each node. Maybe we could get a factor of 2 or so if we moved the original code to Cython (but I'm skeptical), but this just feels awkward.

ChuckHastings commented 4 weeks ago

Did you look at https://github.com/rapidsai/cugraph/blob/f917ae4ad200258f1afb7d2f70ee200828b88479/cpp/include/cugraph_c/traversal_algorithms.h#L158?

This function should return a dense matrix where each row represents the path from a destination matrix back to the source from the BFS. So result[0][0] would be the first destination, result[0][1] would be the predecessor, etc. back to the source. Any vertices beyond the source on a given path would be set to invalid_vertex_id.

Would that make things faster? You wouldn't have to do the predecessor lookups, you would already have contiguous values in GPU memory.

rlratzel commented 3 weeks ago

Thanks, @eriknw. Please also include the before and after output from the relevant pytest benchmarks.

the cost is a non-dict return types, which I think is okay for backends to do, because duck-typing is a thing in Python.

We should discuss this at the next NetworkX dispatching meeting.

eriknw commented 2 weeks ago

Did you look at

https://github.com/rapidsai/cugraph/blob/f917ae4ad200258f1afb7d2f70ee200828b88479/cpp/include/cugraph_c/traversal_algorithms.h#L158

Thanks for sharing that, I didn't know about it! I did experiment as if such a thing (or CSR-like path results) could be done in C, and I found it didn't help performance.

eriknw commented 2 weeks ago

Please also include the before and after output from the relevant pytest benchmarks.

Benchmarks before this PR: shortest_paths_before

Benchmarks with this PR: shortest_paths_after

eriknw commented 2 weeks ago

We should discuss this at the next NetworkX dispatching meeting.

I brought this up during the nx dispatching meeting today. All present thought this approach is perfectly fine.

eriknw commented 2 weeks ago

Moved to here: https://github.com/rapidsai/nx-cugraph/pull/18

eriknw commented 1 week ago

Closing in favor of https://github.com/rapidsai/nx-cugraph/pull/18

rapidsai / cugraph

nx-cugraph: faster `shortest_path` #4739