rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library
https://docs.rapids.ai/api/cugraph/stable/
Apache License 2.0
1.69k stars 300 forks source link

[QST]: How does SSSP sort distance and predecessor output (old: SSSP returns incorrect distances/costs) #4513

Open Friedemannn opened 3 months ago

Friedemannn commented 3 months ago

What is your question?

Edit: This is because SSSP returns only predecessors and distances for connected nodes, so the resulting arrays don't have the same size as the input adjaceny matrix dimensions.

Now I'm wondering how I can connect the SSSP results to my old larger index, which relates to the input dimensions.

Original: Hi,

I'm trying SSSP on a undirected weighted graph represented by a scipy COO adjacency matrix of a cost raster (roughly 4000*5000 grid cells) where each fully connected node has 8 neighbors/edges. Connecting the grid cells to their first and second order neighbors.

Because it's undirected only one edge direction is declared in the adjacency matrix, the other is left out.

If I run following code I get wrong distance and predecessor values:

import scipy as sp
import time
import cugraph
#import adjacency matrix
adj = sp.sparse.load_npz('/content/drive/MyDrive/graphs/mb_50m_cost0-union_adj-COO-arr.npz')
#run sssp
a= time.time()
dist, pred = cugraph.sssp(adj, source=10000000, unweighted =True, directed = False)
print(time.time()-a)
>>> /usr/local/lib/python3.10/dist-packages/cugraph/structure/symmetrize.py:92: FutureWarning: Multi is deprecated and the removal of multi edges will no longer be supported from 'symmetrize'. Multi edges will be removed upon creation of graph instance. warnings.warn(
>>> 9.302005767822266

dist[10005000]
>>> 3.0163074

pred[10000001]
>>> 16485534

From other packages such as networkit, networkx, scipy and graph-tool I know that the distance from node 10005000 should be 1.0297332260524854 and the predecessor of node 10000001 should be 10000000.

Did I do something wrong? Should I input a matrix where every edge is explicitly defined?

You can find the adjacency matrix I used here, together with the google colab notebook I used.

Code of Conduct

Friedemannn commented 3 months ago

My problem was, that I didn't know, that SSSP returns only the distance and predecessors for nodes with edges. Therefore the indices shifted and the resulting pred and dist arrays are smaller compared to other packages I used.

Is there a flag to artificially inflate the resulting arrays? Or if not how are the arrays sorted? How can i connect them to my orginal index which was based on the dimension of the adjacency matrix and therefore included non-connected nodes?

ChuckHastings commented 3 months ago

This seems like a bug. @rlratzel can weigh in here.

There has been an issue in some of the python interfaces where we don't handle isolated vertices properly when we construct the C++ graph data structure. It sounds like this is what is the problem here.

Friedemannn commented 2 months ago

Hi @rlratzel could you provide some info on this? I tested cugraph sssp as part of an assignment and it'd be great if I could explain why it doesn't work.