theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
419 stars 103 forks source link

Error on using 'groupby' covariate with 'stochastic' mode within scv.tl.velocity() #874

Open JesseRop opened 2 years ago

JesseRop commented 2 years ago

Hi @WeilerP , thanks for the great package and all your efforts in maintaining it!

I was able to replicate the error above as in the scripts below. I was wondering whether there's is a way to calculate RNA velocities and velocity streams independently per cluster (or batch) in the stochastic/deterministic mode so to allow the accounting of different dynamics per cluster (or batch). I'm aware of recalculating velocities while accounting for differential kinetics per cluster in the dynamical mode but the mode doesn't work well for my data. Do I have to subset the different clusters into separate anndata objects and run RNA velocities and then merge the objects back together? I was thinking 'groupby' can be used to condition for the different clusters but it seems it has to be used in combination with 'group' to restrict to RNA velocity calculation to just one(or more) cluster rather than perform RNA velocity for all cell conditioning for each cluster.

adata = scv.datasets.dentategyrus()

scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000)

scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

scv.tl.velocity(adata, mode='stochastic', min_r2=0.1, groupby='clusters')
Error output ```pytb computing velocities --------------------------------------------------------------------------- LinAlgError Traceback (most recent call last) Input In [408], in () ----> 1 scv.tl.velocity(adata, mode='stochastic', min_r2=0.1, groupby='clusters') File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/velocity.py:428, in velocity(data, vkey, mode, fit_offset, fit_offset2, filter_genes, groups, groupby, groups_for_fit, constrain_ratio, use_raw, use_latent_time, perc, min_r2, min_likelihood, r2_adjusted, use_highly_variable, diff_kinetics, copy, **kwargs) 419 _adata = adata if groups is None else adata[cell_subset] 420 velo = Velocity( 421 _adata, 422 residual=residual, (...) 426 use_highly_variable=use_highly_variable, 427 ) --> 428 velo.compute_stochastic(fit_offset, fit_offset2, mode, perc=perc) 430 write_residuals(adata, vkey, velo._residual, cell_subset) 431 write_residuals(adata, f"variance_{vkey}", velo._residual2, cell_subset) File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/velocity.py:147, in Velocity.compute_stochastic(self, fit_offset, fit_offset2, mode, perc) 141 res2_std = (cov_us - _gamma2 * var_ss - _offset2).std(0) 143 # solve multiple regression 144 self._offset[idx], self._offset2[idx], self._gamma[idx] = ( 145 maximum_likelihood(_Ms, _Mu, _Mus, _Mss, fit_offset, fit_offset2) 146 if mode == "bayes" --> 147 else leastsq_generalized( 148 _Ms, 149 _Mu, 150 var_ss, 151 cov_us, 152 res_std, 153 res2_std, 154 fit_offset, 155 fit_offset2, 156 perc, 157 ) 158 ) 160 self._residual = self._Mu - self._gamma * self._Ms 161 if fit_offset: File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/optimization.py:173, in leastsq_generalized(x, y, x2, y2, res_std, res2_std, fit_offset, fit_offset2, perc) 171 for i in range(n_var): 172 A = np.c_[x[:, i]] --> 173 gamma[i] = np.linalg.pinv(A.T.dot(A)).dot(A.T.dot(y[:, i])) 175 offset[np.isnan(offset)] = 0 176 offset_ss[np.isnan(offset_ss)] = 0 File <__array_function__ internals>:5, in pinv(*args, **kwargs) File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:2002, in pinv(a, rcond, hermitian) 2000 return wrap(res) 2001 a = a.conjugate() -> 2002 u, s, vt = svd(a, full_matrices=False, hermitian=hermitian) 2004 # discard small singular values 2005 cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True) File <__array_function__ internals>:5, in svd(*args, **kwargs) File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:1660, in svd(a, full_matrices, compute_uv, hermitian) 1657 gufunc = _umath_linalg.svd_n_s 1659 signature = 'D->DdD' if isComplexType(t) else 'd->ddd' -> 1660 u, s, vh = gufunc(a, signature=signature, extobj=extobj) 1661 u = u.astype(result_t, copy=False) 1662 s = s.astype(_realType(result_t), copy=False) File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:97, in _raise_linalgerror_svd_nonconvergence(err, flag) 96 def _raise_linalgerror_svd_nonconvergence(err, flag): ---> 97 raise LinAlgError("SVD did not converge") LinAlgError: SVD did not converge ```
Versions ```pytb scvelo==0.2.5.dev71+g85295fc scanpy==1.9.1 anndata==0.7.8 loompy==3.0.7 numpy==1.21.5 scipy==1.8.0 matplotlib==3.5.1 sklearn==1.0.2 pandas==1.4.2 Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process. ERROR: XMLRPC request failed [code: -32500] RuntimeError: PyPI's XMLRPC API is currently disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information. ```
WeilerP commented 2 years ago

@JesseRop, thanks for reporting this. Looks similar to #856. I'll check if I can reproduce the error on my side and get back to you ASAP. In general: If the dynamical mode does not work on your data, you shouldn't simply switch to another (less sophisticated) one. Both approaches rely on a specific shape of phase portraits and cell types aligned along it. See e.g. #216 or #462 for possible reasons for failure.

WeilerP commented 2 years ago

@JesseRop, the numpy version is the issue. Not sure why, though. When downgrading to numpy<1.21, the error is no longer thrown. @michalk8, @ivirshup did you encounter something similar before by chance, or are aware of anything introduced in numpy>1.21 causing this?

monica-nicolau commented 1 year ago

Hi everyone! @WeilerP are you sure the numpy version is the only issue? Because I've been running scv.tl.velocity for multiple (5 other) samples and it raises this error only for one sample, how could it be working for the other samples if the numpy version is the same? It's not a memory issue either, because I tried running the command different times and it always raises the same error. Could you help me understand what it is the problem of this object?

My current np.__version__ is '1.21.1'

UPDATE *** I managed to avoid the error by specifying enforce=True during the normalization process carried out through this function: scv.pp.filter_and_normalize() After enforce=True, the scv.tl.velocity() function didn't raise anymore the "SVD did not converge" error :)

h4rvey-g commented 1 week ago

Another potential cause of this error if you are using adata exported from Seurat. Ensure that the primary layer exported to adata is the counts layer of the Seurat object, which contains the raw counts. Then, execute scv.pp.filter_and_normalize() on this layer. Avoid using the data layer from Seurat and run scv.pp.filter_and_normalize() again, as it is already the log-normalized version of the counts.