Open GregorySchwing opened 1 year ago
The specific problem mentioned here is an issue in our python code... reassigned to Rick.
@GregorySchwing - There is still an assumption that if you pass multiple seeds they are from different components of the graph. There is no error check (this is not forced on the caller), nor is there a warning if you do. So you may call with multiple seeds from the same component.
If you pass multiple seeds that are from the same component be aware of what you can expect. This is a frontier-based algorithm. You will traverse from all of the seeds as if they are in frontier 0. As each frontier is expanded in parallel, the vertices of that frontier will race to neighbors in the next frontier. If the same vertex that has not been visited in a previous frontier is encountered, one of the paths will win and the others will lose and not be included in the back pointers.
The most important thing to recognize is that the result of passing vertices u and v which are on the same component will not be the same as if you passed u to a call to bfs (and extracted the paths) and then passed v to a call to bfs (and extracted the paths) and merged the results. Each vertex t in the same component as u and v will have a path in the result set either from u OR from v. If the shortest path from v to t and the shortest path fromu to t are of different lengths, you'll get the shorter path. If they are the same length, you'll get whichever path won the race.
The specific problem mentioned here is an issue in our python code... reassigned to Rick.
@GregorySchwing - There is still an assumption that if you pass multiple seeds they are from different components of the graph. There is no error check (this is not forced on the caller), nor is there a warning if you do. So you may call with multiple seeds from the same component.
If you pass multiple seeds that are from the same component be aware of what you can expect. This is a frontier-based algorithm. You will traverse from all of the seeds as if they are in frontier 0. As each frontier is expanded in parallel, the vertices of that frontier will race to neighbors in the next frontier. If the same vertex that has not been visited in a previous frontier is encountered, one of the paths will win and the others will lose and not be included in the back pointers.
The most important thing to recognize is that the result of passing vertices u and v which are on the same component will not be the same as if you passed u to a call to bfs (and extracted the paths) and then passed v to a call to bfs (and extracted the paths) and merged the results. Each vertex t in the same component as u and v will have a path in the result set either from u OR from v. If the shortest path from v to t and the shortest path fromu to t are of different lengths, you'll get the shorter path. If they are the same length, you'll get whichever path won the race.
I understand, and I am interested in using MSBFS to find paths to vertices which are equidistant from two arbitrary seeds for augmenting path max matching algorithms.
Changing this to a feature request as this is not a bug.
@GregorySchwing - starting to look at this request to plan when we can execute this.
Do you have an idea of how many concurrent traversals you would want to do in a single call? Supporting multiple concurrent traversals will add memory pressure as we need to keep track of back pointers for all of the vertices we visit.
This is no longer an interest of mine. The reason I opened the issue was because the capacity to do this already exists depending on the format the graph is passed as (cuDF vs networkx).
"For cugraphs created from cudf, passing an array as the start argument works fine. However, graph types verify the validity of the start vertices differently, and the way a list is evaluated by the in operator on a graph type returns false when it should return true."
Rewriting our objective here.
The current implementation of multi-source BFS only provides correct results if the list of sources are on separate connected components. The request here is to add support for multi-source support where the source vertices may be in the same connected component.
Original Request
Version
22.12.00+0.g8474cfcf.dirty
Which installation method(s) does this occur on?
Docker
Describe the bug.
From the BFS documentation
start Integer or list, optional (default=None)
The requirement that "Only one vertex per connected component of the graph is allowed." was stated to be deprecated by Brad Rees here : https://stackoverflow.com/questions/70632337/how-to-accelerate-finding-all-pairs-shortest-path-with-gpu-using-rapids-cugraph
For cugraphs created from cudf, passing an array as the start argument works fine. However, graph types verify the validity of the start vertices differently, and the way a list is evaluated by the in operator on a graph type returns false when it should return true.
Reason lines 50-54 of cugraph/cugraph/traversal/bfs.py
"""
ensure start vertex is valid
""" Fix is to Replace if start not in G with any(v not in G for v in start)) example.zip
Minimum reproducible example
Relevant log output
Environment details
Other/Misc.
example.zip
Code of Conduct