Open Lem-P opened 7 months ago
where did you get the error? That means do you get this in rsc or cugraph. Could you also please upload the full stack-trace. If you can reproduce the error just with cugraph. I think I would be amazing if you create an issue there too.
I get this in rapids_singlecell
RuntimeError Traceback (most recent call last) Cell In[42], line 1 ----> 1 rsc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25) 2 rsc.tl.leiden(adata, key_added="leiden_res0_5", resolution=0.5) 3 rsc.tl.leiden(adata, key_added="leiden_res0_1", resolution=0.1)
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/rapids_singlecell/tools/_clustering.py:125, in leiden(adata, resolution, random_state, restrict_to, key_added, adjacency, n_iterations, use_weights, neighbors_key, obsp, copy) 117 restrict_key, restrict_categories = restrict_to 118 adjacency, restrict_indices = restrict_adjacency( 119 adata=adata, 120 restrict_key=restrict_key, 121 restrict_categories=restrict_categories, 122 adjacency=adjacency, 123 ) --> 125 g = _create_graph(adjacency, use_weights) 126 # Cluster 127 leidenparts, = culeiden( 128 g, 129 resolution=resolution, 130 random_state=random_state, 131 max_iter=n_iterations, 132 )
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/rapids_singlecell/tools/_clustering.py:31, in _create_graph(adjacency, use_weights) 29 warnings.simplefilter("ignore") 30 if use_weights: ---> 31 g.from_cudf_edgelist( 32 df, source="source", destination="destination", weight="weights" 33 ) 34 else: 35 g.from_cudf_edgelist(df, source="source", destination="destination")
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/graph_classes.py:193, in Graph.from_cudf_edgelist(self, input_df, source, destination, edge_attr, weight, edge_id, edge_type, renumber, store_transposed, legacy_renum_only) 191 elif self._Impl.edgelist is not None or self._Impl.adjlist is not None: 192 raise RuntimeError("Graph already has values") --> 193 self._Impl._simpleGraphImpl__from_edgelist( 194 input_df, 195 source=source, 196 destination=destination, 197 edge_attr=edge_attr, 198 weight=weight, 199 edge_id=edge_id, 200 edge_type=edge_type, 201 renumber=renumber, 202 store_transposed=store_transposed, 203 legacy_renum_only=legacy_renum_only, 204 )
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/graph_implementation/simpleGraph.py:262, in simpleGraphImpl.__from_edgelist(self, input_df, source, destination, edge_attr, weight, edge_id, edge_type, renumber, legacy_renum_only, store_transposed) 257 # The dataframe will be symmetrized iff the graph is undirected 258 # otherwise the inital dataframe will be returned. Duplicated edges 259 # will be dropped unless the graph is a MultiGraph(Not Implemented yet) 260 # TODO: Update Symmetrize to work on Graph and/or DataFrame 261 if edge_attr is not None: --> 262 source_col, dest_col, value_col = symmetrize( 263 elist, 264 source, 265 destination, 266 edge_attr, 267 multi=self.properties.multi_edge, # Deprecated parameter 268 symmetrize=not self.properties.directed, 269 ) 271 if isinstance(value_col, cudf.DataFrame): 272 value_dict = {}
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/symmetrize.py:281, in symmetrize(input_df, source_col_name, dest_col_name, value_col_name, multi, symmetrize, do_expensive_check) 272 output_df = symmetrize_ddf( 273 input_df, 274 source_col_name, (...) 278 symmetrize, 279 ) 280 else: --> 281 output_df = symmetrize_df( 282 input_df, 283 source_col_name, 284 dest_col_name, 285 value_col_name, 286 multi, 287 symmetrize, 288 ) 289 if value_col_name is not None: 290 value_col = output_df[value_col_name]
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/symmetrize.py:100, in symmetrize_df(df, src_name, dst_name, weight_name, multi, symmetrize) 93 warnings.warn( 94 "Multi is deprecated and the removal of multi edges will no longer be " 95 "supported from 'symmetrize'. Multi edges will be removed upon creation " 96 "of graph instance.", 97 FutureWarning, 98 ) 99 vertex_col_name = src_name + dst_name --> 100 result = result.groupby(by=[*vertex_col_name], as_index=False).min() 101 return result
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py:11, in _partialmethod.
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/groupby/groupby.py:701, in GroupBy._reduce(self, op, numeric_only, min_count, *args, **kwargs) 697 if min_count != 0: 698 raise NotImplementedError( 699 "min_count parameter is not implemented yet" 700 ) --> 701 return self.agg(op)
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/nvtx/nvtx.py:116, in annotate.call.
File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/groupby/groupby.py:567, in GroupBy.agg(self, func) 558 orig_dtypes = tuple(c.dtype for c in columns) 560 # Note: When there are no key columns, the below produces 561 # a Float64Index, while Pandas returns an Int64Index 562 # (GH: 6945) 563 ( 564 result_columns, 565 grouped_key_cols, 566 included_aggregations, --> 567 ) = self._groupby.aggregate(columns, normalized_aggs) 569 result_index = self.grouping.keys._from_columns_like_self( 570 grouped_key_cols, 571 ) 573 multilevel = _is_multi_agg(func)
File groupby.pyx:350, in cudf._lib.groupby.GroupBy.aggregate()
File groupby.pyx:252, in cudf._lib.groupby.GroupBy.aggregate_internal()
RuntimeError: CUDA error encountered at: /opt/conda/conda-bld/work/cpp/src/hash/concurrent_unordered_map.cuh:546: 101 cudaErrorInvalidDevice invalid device ordinal
Ok I cant reproduce the error. Can you make an issue on cugraph. This happens inside of the cugraph graph construction. They should know about this, because they might be able to fix this.
Describe the bug While running the leiden algorithm
rsc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25
, I got a "CUDA error encountered 101 cudaErrorInvalidDevice invalid device ordinal"Just setting
rmm.reinitialize
managed_memory
toFalse
resolved the issueExpected behavior Just information for other people running into the same error
Environment details (please complete the following information):