theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
417 stars 102 forks source link

LinAlgError: The data appears to lie in a lower-dimensional subspace of the space in which it is expressed #1115

Closed zhouzhendiao closed 11 months ago

zhouzhendiao commented 1 year ago

Error while running scv.tl.recover_dynamics, this seems like a data-specific problem since no error when I change to another data. However, is there a way to avoid this?

adata:


AnnData object with n_obs × n_vars = 15498 × 2000
    obs: 'sample_id', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'leiden_res0.1', 'leiden_res0.25', 'leiden_res0.5', 'Immune_All_High_predicted_labels', 'Immune_All_High_majority_voting', 'Immune_All_Low_predicted_labels', 'Immune_All_Low_majority_voting', 'hpca', 'copykat.pred', 'scDblFinder.score', 'scDblFinder.class', 'S_score', 'G2M_score', 'phase', 'low_qc_cluster', 'leiden_res1', 'sampling_point', 'low_qc_cells', 'celltype_main_v1', 'celltype_main_v2', 'receptor_type', 'receptor_subtype', 'chain_pairing', 'leiden_res1.5', 'leiden_res2', 'cell_type_fine_20230502', 'celltype_fine_v1', 'celltype_fine_v2', 'predicted_labels', 'over_clustering', 'majority_voting', 'conf_score', 'leiden_res3', 'patient_id', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'clusters_gradients', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition', 'root_cells', 'end_points', 'velocity_pseudotime'
    var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'gene_count_corr', 'velocity_gamma', 'velocity_qreg_ratio', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'Immune_All_High_predicted_labels_colors', 'Immune_All_Low_predicted_labels_colors', 'airr:chain_pairing_colors', 'airr:clonal_expansion_colors', 'airr:receptor_subtype_colors', 'bcr:chain_pairing_colors', 'bcr:receptor_subtype_colors', 'cell_type_fine_20230502_colors', 'celltype_fine_v2_colors', 'celltype_main_v1_colors', 'celltype_main_v2_colors', 'copykat.pred_colors', 'hpca_colors', 'hvg', 'leiden', 'leiden_res0.1_colors', 'leiden_res0.25_colors', 'leiden_res0.5_colors', 'leiden_res1_colors', 'leiden_res3_colors', 'log1p', 'low_qc_cells_colors', 'low_qc_cluster_colors', 'neighbors', 'pca', 'phase_colors', 'rank_genes_groups', 'sample_id_colors', 'sampling_point_colors', 'scDblFinder.class_colors', 'tcr:chain_pairing_colors', 'tcr:receptor_subtype_colors', 'umap', 'velocity_params', 'velocity_graph', 'velocity_graph_neg', 'rank_velocity_genes', 'clusters_gradients_colors', 'recover_dynamics'
    obsm: 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'counts', 'matrix', 'soupx', 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'
    obsp: 'connectivities', 'distances'
scv.tl.recover_dynamics(adata, n_jobs=128)
Error output ```pytb recovering dynamics (using 128/128 cores) 100% 290/291 [01:32<00:00, 1.96gene/s] --------------------------------------------------------------------------- LinAlgError Traceback (most recent call last) Cell In[27], line 1 ----> 1 scv.tl.recover_dynamics(adata, n_jobs=128) File /nas/01.private_person/02.zhouzhendiao/01.software/scvelo-master/scvelo/tools/_em_model_core.py:553, in recover_dynamics(data, var_names, n_top_genes, max_iter, assignment_mode, t_max, fit_time, fit_scaling, fit_steady_states, fit_connected_states, fit_basal_transcription, use_raw, load_pars, return_model, plot_results, steady_state_prior, add_key, copy, n_jobs, backend, **kwargs) 549 Tau_ = adata.layers["fit_tau_"] 551 conn = get_connectivities(adata) if fit_connected_states else None --> 553 res = parallelize( 554 _fit_recovery, 555 var_names, 556 n_jobs, 557 unit="gene", 558 as_array=False, 559 backend=backend, 560 show_progress_bar=len(var_names) > 9, 561 )( 562 adata=adata, 563 use_raw=use_raw, 564 load_pars=load_pars, 565 max_iter=max_iter, 566 fit_time=fit_time, 567 fit_steady_states=fit_steady_states, 568 fit_scaling=fit_scaling, 569 fit_basal_transcription=fit_basal_transcription, 570 steady_state_prior=steady_state_prior, 571 conn=conn, 572 assignment_mode=assignment_mode, 573 **kwargs, 574 ) 575 idx, dms = map(_flatten, zip(*res)) 577 for ix, dm in zip(idx, dms): File /nas/01.private_person/02.zhouzhendiao/01.software/scvelo-master/scvelo/core/_parallelize.py:127, in parallelize..wrapper(*args, **kwargs) 124 else: 125 pbar, queue, thread = None, None, None --> 127 res = Parallel(n_jobs=n_jobs, backend=backend)( 128 delayed(callback)( 129 *((i, cs) if use_ixs else (cs,)), 130 *args, 131 **kwargs, 132 queue=queue, 133 ) 134 for i, cs in enumerate(collections) 135 ) 137 res = np.array(res) if as_array else res 138 if thread is not None: File /opt/conda/lib/python3.10/site-packages/joblib/parallel.py:1098, in Parallel.__call__(self, iterable) 1095 self._iterating = False 1097 with self._backend.retrieval_context(): -> 1098 self.retrieve() 1099 # Make sure that we get a last message telling us we are done 1100 elapsed_time = time.time() - self._start_time File /opt/conda/lib/python3.10/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self) 973 try: 974 if getattr(self._backend, 'supports_timeout', False): --> 975 self._output.extend(job.get(timeout=self.timeout)) 976 else: 977 self._output.extend(job.get()) File /opt/conda/lib/python3.10/site-packages/joblib/_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout) 564 """Wrapper for Future.result to implement the same behaviour as 565 AsyncResults.get from multiprocessing.""" 566 try: --> 567 return future.result(timeout=timeout) 568 except CfTimeoutError as e: 569 raise TimeoutError from e File /opt/conda/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout) 449 raise CancelledError() 450 elif self._state == FINISHED: --> 451 return self.__get_result() 453 self._condition.wait(timeout) 455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]: File /opt/conda/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self) 401 if self._exception: 402 try: --> 403 raise self._exception 404 finally: 405 # Break a reference cycle with the exception in self._exception 406 self = None LinAlgError: The data appears to lie in a lower-dimensional subspace of the space in which it is expressed. This has resulted in a singular data covariance matrix, which cannot be treated using the algorithms implemented in `gaussian_kde`. Consider performing principle component analysis / dimensionality reduction and using `gaussian_kde` with the transformed data. ```
Versions ```pytb ----- anndata 0.9.2 scanpy 1.9.3 ----- Cython 0.29.34 PIL 9.5.0 absl NA aiohttp 3.8.4 aiosignal 1.3.1 annotated_types 0.5.0 anyio NA arrow 1.2.3 asttokens NA async_timeout 4.0.2 attr 22.2.0 awkward 2.3.1 awkward_cpp NA babel 2.12.1 backcall 0.2.0 backoff 2.2.1 beta_ufunc NA binom_ufunc NA bottleneck 1.3.7 brotli NA bs4 4.12.2 cairo 1.24.0 certifi 2023.07.22 cffi 1.15.1 charset_normalizer 3.1.0 chex 0.1.7 click 8.1.3 cloudpickle 2.2.1 colorama 0.3.9 comm 0.1.3 contextlib2 NA croniter NA cycler 0.10.0 cython 0.29.34 cython_runtime NA cytoolz 0.12.0 dask 2023.4.1 dateutil 2.8.2 debugpy 1.6.7 decorator 5.1.1 deepdiff 6.3.1 defusedxml 0.7.1 dill 0.3.6 docrep 0.3.2 etils 1.4.1 executing 1.2.0 fastapi 0.101.1 fastjsonschema NA flax 0.7.2 fqdn NA frozenlist 1.4.0 fsspec 2023.4.0 gi 3.44.1 gio NA glib NA gmpy2 2.1.2 gobject NA google NA gtk NA h5py 3.8.0 hypergeom_ufunc NA idna 3.4 igraph 0.10.4 importlib_resources NA invgauss_ufunc NA ipykernel 6.22.0 ipython_genutils 0.2.0 ipywidgets 8.0.6 isal 1.1.0 isoduration NA jax 0.4.14 jaxlib 0.4.14 jedi 0.18.2 jinja2 3.1.2 joblib 1.2.0 json5 NA jsonpointer 2.0 jsonschema 4.17.3 jupyter_events 0.5.0 jupyter_server 2.5.0 jupyterlab_server 2.22.1 kiwisolver 1.4.4 leidenalg 0.10.0 lightning 2.0.6 lightning_cloud NA lightning_utilities 0.9.0 llvmlite 0.40.1 louvain 0.8.1 lz4 4.3.2 markupsafe 2.1.2 matplotlib 3.7.1 matplotlib_inline 0.1.6 ml_collections NA ml_dtypes 0.2.0 mpl_toolkits NA mpmath 1.3.0 msgpack 1.0.5 mudata 0.2.3 multidict 6.0.4 multipart 0.0.6 multipledispatch 0.6.0 natsort 8.4.0 nbformat 5.8.0 nbinom_ufunc NA ncf_ufunc NA nct_ufunc NA ncx2_ufunc NA networkx 3.1 numba 0.57.1 numexpr 2.8.4 numpy 1.24.4 numpyro 0.12.1 nvfuser NA opt_einsum v3.3.0 optax 0.1.7 ordered_set 4.1.0 packaging 23.1 pandas 1.5.3 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA platformdirs 3.5.0 prometheus_client NA prompt_toolkit 3.0.38 psutil 5.9.5 ptyprocess 0.7.0 pure_eval 0.2.2 pvectorc NA pyarrow 11.0.0 pydantic 2.0.3 pydantic_core 2.3.0 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.15.1 pyparsing 3.0.9 pyro 1.8.6 pyrsistent NA pythonjsonlogger NA pytz 2023.3 requests 2.29.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rich NA ruamel NA scipy 1.10.1 scvelo 0.3.0 scvi 1.0.3 send2trash NA session_info 1.0.0 setuptools 68.1.0 setuptools_scm NA six 1.16.0 skewnorm_ufunc NA sklearn 1.1.3 sniffio 1.3.0 socks 1.7.1 soupsieve 2.3.2.post1 sparse 0.14.0 stack_data 0.6.2 starlette 0.27.0 sympy 1.11.1 tblib 1.7.0 texttable 1.6.7 threadpoolctl 3.1.0 tlz 0.12.0 tomli 2.0.1 toolz 0.12.0 torch 2.0.1+cu117 torchmetrics 1.0.3 tornado 6.3 tqdm 4.65.0 traitlets 5.9.0 tree 0.1.8 typing_extensions NA uri_template NA urllib3 1.23 uvicorn 0.23.2 wcwidth 0.2.6 webcolors 1.13 websocket 1.5.1 websockets 11.0.3 wrapt 1.15.0 xarray 2023.7.0 yaml 6.0 yarl 1.9.2 zipp NA zmq 25.0.2 zoneinfo NA zstandard 0.19.0 ----- IPython 8.13.1 jupyter_client 8.2.0 jupyter_core 5.3.0 jupyterlab 3.6.3 notebook 6.5.4 ----- Python 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0] Linux-3.10.0-1160.81.1.el7.x86_64-x86_64-with-glibc2.35 ----- Session information updated at 2023-08-21 14:07 ```
WeilerP commented 11 months ago

Cannot help debug since not reproducible. Also seems to be an issued cause by a third party library, not scvelo.

zhouzhendiao commented 9 months ago

Soveled by changed filtering threshold from:

scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)

to:

scv.pp.filter_and_normalize(adata, min_cells =3, min_cells_u=3,min_shared_cells=3, n_top_genes=2000)