rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library
https://docs.rapids.ai/api/cugraph/stable/
Apache License 2.0
1.75k stars 304 forks source link

[FEA]: cugraph scipy.sparse arrays support #4520

Open Friedemannn opened 4 months ago

Friedemannn commented 4 months ago

Is this a new feature, an improvement, or a change to existing functionality?

Improvement

How would you describe the priority of this feature request

Low (would be nice)

Please provide a clear description of problem this feature solves

It would enable users to do SSSP (or other things) on Scipy sparse arrays, which is the preferred way to store sparse data by scipy, see note under: https://docs.scipy.org/doc/scipy/reference/sparse.html

Describe your ideal solution

Also allow scipy.sparse.coo_array etc. as input to cugraph functions with current support for scipy.sparse.coo_matrix.

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

BradReesWork commented 3 months ago

@Friedemannn we are working your suggestion into our roadmap

jnke2016 commented 3 months ago

@Friedemannn thank you for filling a feature request form regarding sssp. In fact, our current Pylibcugraph API only supports device arrays (cupy, cudf) as input when creating the pylibcugraph graph and calling any pylibcugraphalgorithms. While we are working on adding it into our roadmap, you can always make a conversion from a scipy to a cupy array as it can be seeing from the example below.

import cupy as cp
import numpy as np
from scipy.sparse import csr_matrix
from pylibcugraph import sssp as pylibcugraph_sssp
import pylibcugraph

graph = [[0, 1, 1, 0, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 0, 0, 0]]
scipy_csr = csr_matrix(graph)

rows, cols = scipy_csr.nonzero()

cp_offsets = cp.asarray(scipy_csr.indptr)
cp_indices = cp.asarray(scipy_csr.indices, dtype=np.int32)
weight_array = cp.asarray([1.0]*len(cp_indices), dtype=np.float32)

resource_handle = pylibcugraph.ResourceHandle()
graph_props = pylibcugraph.GraphProperties(is_symmetric=False, is_multigraph=False)

plc_graph = pylibcugraph.SGGraph(
    resource_handle,
    graph_props,
    cp_offsets,
    cp_indices,
    weight_array,
    store_transposed=False,
    renumber=True,
    do_expensive_check=False,
    input_array_format="CSR")

cp_vertices, cp_distances, cp_predecessors = pylibcugraph_sssp(
    resource_handle=resource_handle,
    graph=plc_graph,
    source=0,
    cutoff=999,
    compute_predecessors=True,
    do_expensive_check=True)

These steps also hold for scipy.sparse.coo_array. In fact, our pylibcugraph graph creation supports both COO and CSR array format.