scverse / rapids_singlecell

Rapids_singlecell: A GPU-accelerated tool for scRNA analysis. Offers seamless scverse compatibility for efficient single-cell data processing and analysis.
https://rapids-singlecell.readthedocs.io/
MIT License
105 stars 18 forks source link

[BUG] rsc.pp.scrublet: ValueError: Number of components should not be greater thanthe number of columns in the data #203

Closed fuzh25 closed 1 week ago

fuzh25 commented 1 month ago

Describe the bug Running the code "rsc.pp.scrublet(adata,n_prin_comps=30, random_state=1000,batch_key='batch')" results in an error:ValueError: Number of components should not be greater thanthe number of columns in the data

Steps/Code to reproduce bug rsc.pp.scrublet(adata,n_prin_comps=30, random_state=1000,batch_key='batch')

Expected behavior This will occur when the sample size is large.

Environment details (please complete the following information): Python 3.10 scanpy version: 1.10.1 rapids_singlecell version: 0.10.4

Additional context Add any other context about the problem here.

fuzh25 commented 1 month ago

The specific operation status is as follows: rsc.pp.scrublet(adata,n_prin_comps=30, random_state=1000,batch_key='batch') Running Scrublet Embedding transcriptomes using PCA... Automatically set threshold at doublet score = 0.02 Detected doublet rate = 82.2% Estimated detectable doublet fraction = 96.5% Overall doublet rate: Expected = 5.0% Estimated = 85.1% Embedding transcriptomes using PCA... Automatically set threshold at doublet score = 0.03 Detected doublet rate = 77.6% Estimated detectable doublet fraction = 92.7% Overall doublet rate: Expected = 5.0% Estimated = 83.7% Embedding transcriptomes using PCA... Traceback (most recent call last): File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in rsc.pp.scrublet(adata,n_prin_comps=30, random_state=1000,batch_key='batch') File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/legacy_api_wrap/init.py", line 80, in fn_compatible return fn(*args_all, kw) File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/init.py", line 259, in scrublet scrubbed = [ File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/init.py", line 260, in _run_scrublet( File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/init.py", line 230, in _run_scrublet ad_obs = _scrublet_call_doublets( File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/init.py", line 433, in _scrublet_call_doublets pipeline.pca(scrub, n_prin_comps=n_prin_comps, random_state=scrub._random_state) File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/rapids_singlecell/preprocessing/_scrublet/pipeline.py", line 83, in pca pca = PCA(n_components=n_prin_comps, random_state=random_state).fit(X_obs) File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper ret = func(*args, *kwargs) File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch return self.dispatch_func(func_name, gpu_func, args, kwargs) File "/home/fuzh25/anaconda3/envs/omicverse_gpu/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper return func(*args, **kwargs) File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func File "pca.pyx", line 443, in cuml.decomposition.pca.PCA.fit ValueError: Number of components should not be greater thanthe number of columns in the data

Intron7 commented 1 month ago

I see the error. But I don't know if thats something that has to be fixed. The subset of data you try to analyse has very little genes/features less than 30.