[QST]: LEIDEN CLUSTERING NOT USING GPU

siddharthamantrala commented 3 weeks ago

What is your question?

Hello @benfred @akasper @mattf,

I am trying to run leiden clustering on ~40 M cells. During the run I see the GPU is idle in terms of power usage and is forever to perform the leiden clustering. It takes time to execute the code below. Could I please know how can I sort out the issue?

rsc.tl.leiden(adatafilt) ->

def leiden (args): ->

g = _create_graph(adjacency, use_weights) ->

`def _create_graph(adjacency, use_weights=True): from cugraph import Graph

sources, targets = adjacency.nonzero()
weights = adjacency[sources, targets]
if isinstance(weights, np.matrix):
    weights = weights.A1
df = cudf.DataFrame({"source": sources, "destination": targets, "weights": weights})
g = Graph()
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    if use_weights:
        g.from_cudf_edgelist(
            df, source="source", destination="destination", weight="weights"
        )
    else:
        g.from_cudf_edgelist(df, source="source", destination="destination")
return g`

Takes forever to execute the below line g.from_cudf_edgelist(df, source="source", destination="destination", weight="weights")

Though I posted this in the rapids_singlecell library (rsc), I assume because cuGraph is taking too long, your inputs would be great.

Best, Sid

Code of Conduct

[X] I agree to follow cuGraph's Code of Conduct
[X] I have searched the open issues and have found no duplicates for this question

nv-rliu commented 3 weeks ago

Hi @siddharthamantrala

Thanks for submitting your question. Just taking a look -- what is the dataset you're using to create your df object? Also, what method of installation did you use for cugraph?

siddharthamantrala commented 3 weeks ago

Hi @nv-rliu , It's a private dataset in h5ad format. The rapids_singlecell library has a function rsc.get.anndata_to_GPU(h5ad file) to move the data to GPU. The object passed to rsc.tl.leiden has the adjacency information computed using nearest neighbors (even this takes around 6 hrs to compute), but that is a different issue (For now am accelerating it using CAGRA, ~15 mins). I installed the cugraph=24.08.00 from https://docs.rapids.ai/install .

For Leiden clustering, function def for `rsc.tl.leiden()`

def leiden( adata: AnnData, resolution: float = 1.0, *, random_state: int | None = 0, restrict_to: tuple[str, Sequence[str]] | None = None, key_added: str = "leiden", adjacency: sparse.spmatrix | None = None, n_iterations: int = 100, use_weights: bool = True, neighbors_key: str | None = None, obsp: str | None = None, copy: bool = False, ) -> AnnData | None: """ Performs Leiden clustering using cuGraph, which implements the method described in:

Traag, V.A., Waltman, L., & van Eck, N.J. (2019). From Louvain to
Leiden: guaranteeing well-connected communities. Sci. Rep., 9(1), 5233.
DOI: 10.1038/s41598-019-41695-z

Parameters
----------
    adata :
        annData object

    resolution
        A parameter value controlling the coarseness of the clustering.
        (called gamma in the modularity formula). Higher values lead to
        more clusters.

    random_state
        Change the initialization of the optimization. Defaults to 0.

    restrict_to
        Restrict the clustering to the categories within the key for
        sample annotation, tuple needs to contain
        `(obs_key, list_of_categories)`.

    key_added
        `adata.obs` key under which to add the cluster labels.

    adjacency
        Sparse adjacency matrix of the graph, defaults to neighbors
        connectivities.

    n_iterations
        This controls the maximum number of levels/iterations of the
        Leiden algorithm. When specified, the algorithm will terminate
        after no more than the specified number of iterations. No error
        occurs when the algorithm terminates early in this manner.

    use_weights
        If `True`, edge weights from the graph are used in the
        computation (placing more emphasis on stronger edges).

    neighbors_key
        If not specified, `leiden` looks at `.obsp['connectivities']`
        for neighbors connectivities. If specified, `leiden` looks at
        `.obsp[.uns[neighbors_key]['connectivities_key']]` for neighbors
        connectivities.

    obsp
        Use .obsp[obsp] as adjacency. You can't specify both
        `obsp` and `neighbors_key` at the same time.

    copy
        Whether to copy `adata` or modify it in place.
"""
# Adjacency graph
from cugraph import leiden as culeiden

adata = adata.copy() if copy else adata

if adjacency is None:
    adjacency = _choose_graph(adata, obsp, neighbors_key)
if restrict_to is not None:
    restrict_key, restrict_categories = restrict_to
    adjacency, restrict_indices = restrict_adjacency(
        adata=adata,
        restrict_key=restrict_key,
        restrict_categories=restrict_categories,
        adjacency=adjacency,
    )

g = _create_graph(adjacency, use_weights)
# Cluster
leiden_parts, _ = culeiden(
    g,
    resolution=resolution,
    random_state=random_state,
    max_iter=n_iterations,
)

# Format output
groups = (
    leiden_parts.to_pandas().sort_values("vertex")[["partition"]].to_numpy().ravel()
)
if restrict_to is not None:
    if key_added == "leiden":
        key_added += "_R"
    groups = rename_groups(
        adata,
        key_added=key_added,
        restrict_key=restrict_key,
        restrict_categories=restrict_categories,
        restrict_indices=restrict_indices,
        groups=groups,
    )
adata.obs[key_added] = pd.Categorical(
    values=groups.astype("U"),
    categories=natsorted(map(str, np.unique(groups))),
)
# store information on the clustering parameters
adata.uns["leiden"] = {}
adata.uns["leiden"]["params"] = {
    "resolution": resolution,
    "random_state": random_state,
    "n_iterations": n_iterations,
}
return adata if copy else None

nv-rliu commented 3 weeks ago

So if I'm understanding this correctly, you have an edgelist (cudf.DataFrame) created from your data, but when you try to create a cugraph.Graph object by calling g.from_cudf_edgelist(df, source="source", destination="destination", weight="weights"), it is taking very long. This is before you even get chance to call cugraph.Leiden

nv-rliu commented 3 weeks ago

What is the size of the edge-list that you have? Can you share the output from running this in _create_graph():

type(df)
df.info(verbose=True)

siddharthamantrala commented 3 weeks ago

nv-rliu commented 3 weeks ago

Thank you for sharing. Please give me just a moment

rlratzel commented 2 weeks ago

Hi @siddharthamantrala ,

Takes forever to execute the below line g.from_cudf_edgelist(df, source="source", destination="destination", weight="weights")

I tried running a simple example to see if I could reproduce the problem, but I saw a reasonable runtime for the above line. I've attached an example which generates an edgelist and used it to populate a graph. I'm using a smaller GPU than you are so my example graph is only 7.8M nodes and 164.1M edges, but that only took 10 seconds to populate using G.from_cudf_edgelist().

Can you try this example and let us know if you see unexpectedly slow performance?

issue4627_py.txt

siddharthamantrala commented 2 weeks ago

Hi @rlratzel ,

Thanks a lot for sharing the code and looking into the issue. I tried the code you shared. On the above mentioned GPU specs, your example graph took 4 seconds to populate using G.from_cudf_edgelist(). When I increase the scale factor to 25, I run into memory issues. If I try using RAPIDS MEMORY MANAGEMENT (rmm), I am able to increase the scale factor to 25 and populate the graph in ~ 1 min. With rmm for your example graph it takes ~30 seconds. When I increase the scale to 26 it takes forever to populate the graph. I attaching the script with the rmm related (commented out) changes made to the script. Please look at it and suggest how can I go forward with it.

issue4627_py_SM_EDITS.txt

rlratzel commented 1 week ago

Thanks @siddharthamantrala for the updated script. I ran it on my workstation using scale 26 with RMM managed memory enabled and I'm seeing the same behavior you are. I'm going to debug further and I'll update this issue when I find out more.

rlratzel commented 1 week ago

Hi @siddharthamantrala , I think the problem here can be traced back to a cudf issue that has recently been resolved.

I'm able to reproduce your problem with an older version of cudf, but the problem is resolved and I'm able to run your script to completion in a few minutes on scale 26 when I upgrade cudf to the latest version from the rapidsai-nightly channel.

Can you check your cudf version (run "conda list cudf" if you're using conda)? For me, here's what I saw:

The broken version of cudf I was using: cudf 24.10.00a196 cuda11_py310_240816_ge690d9d25b_196 rapidsai-nightly

The updated version of cudf that works for me: cudf 24.10.00a292 cuda11_py310_240905_gad1369d2d6_292 rapidsai-nightly

nv-rliu commented 3 days ago

Hi @siddharthamantrala

Just following up here. Were you able to resolve the issue and create the Graph? I'll go ahead and mark this as resolved for now but LMK!

siddharthamantrala commented 3 days ago

Hi @rlratzel ,

Thanks a lot for letting me know and following up about the updated cudf version. Could I know if it is updated in the latest cudf nightly installation available from the https://docs.rapids.ai/install page. For the issue I posted and the current version I have already installed is the Stable 24.08 version using pip. Do I also have to update the other RAPIDS packages too?

rapidsai / cugraph