scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.93k stars 604 forks source link

scanpy.api.tl.pca compensate for few genes < 50 but not few cells. #432

Open Xparx opened 5 years ago

Xparx commented 5 years ago

Minor bug I assume.

Using fewer than 50 cells raises the following error when trying to run sc.tl.pca. The code handles this when the n_vars < 50 but not when n_obs is.

ValueErrorTraceback (most recent call last)
<ipython-input-823-4c11b9b62e6d> in <module>
----> 1 sc.tl.pca(bla)

~/.virtualenvs/default/lib/python3.6/site-packages/scanpy/preprocessing/simple.py in pca(data, n_comps, zero_center, svd_solver, random_state, return_info, use_highly_variable, dtype, copy, chunked, chunk_size)
    504             pca_ = TruncatedSVD(n_components=n_comps, random_state=random_state)
    505             X = adata_comp.X
--> 506         X_pca = pca_.fit_transform(X)
    507 
    508     if X_pca.dtype.descr != np.dtype(dtype).descr: X_pca = X_pca.astype(dtype)

~/.virtualenvs/default/lib/python3.6/site-packages/sklearn/decomposition/pca.py in fit_transform(self, X, y)
    357 
    358         """
--> 359         U, S, V = self._fit(X)
    360         U = U[:, :self.n_components_]
    361 

~/.virtualenvs/default/lib/python3.6/site-packages/sklearn/decomposition/pca.py in _fit(self, X)
    404         # Call different fits for either full or truncated SVD
    405         if self._fit_svd_solver == 'full':
--> 406             return self._fit_full(X, n_components)
    407         elif self._fit_svd_solver in ['arpack', 'randomized']:
    408             return self._fit_truncated(X, n_components, self._fit_svd_solver)

~/.virtualenvs/default/lib/python3.6/site-packages/sklearn/decomposition/pca.py in _fit_full(self, X, n_components)
    423                              "min(n_samples, n_features)=%r with "
    424                              "svd_solver='full'"
--> 425                              % (n_components, min(n_samples, n_features)))
    426         elif n_components >= 1:
    427             if not isinstance(n_components, (numbers.Integral, np.integer)):

ValueError: n_components=50 must be between 0 and min(n_samples, n_features)=38 with svd_solver='full'
falexwolf commented 5 years ago

Hm, I would even say this is bug on the sklearn PCA level.

yuqiyuqitan commented 5 years ago

I ran into the same error.

yuqiyuqitan commented 5 years ago

I have larger number of cells than 50. but sc.tl.pca(adata, use_highly_variable_genes = False) resolved my error.

LuckyMD commented 5 years ago

So you have fewer than 50 HVGs? Maybe just use more HVGs, then the whole thing should work.

Xparx commented 5 years ago

Another option is to set fewer components to use in sc.tl.pca, option n_comps should be set to at most number of variable genes.