scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 595 forks source link

mito_genes and ValueError Traceback (most recent call last) #647

Open sygongcode opened 5 years ago

sygongcode commented 5 years ago
mito_genes = adata.var_names.str.startswith('MT-')
adata.obs['percent_mito'] = np.sum(adata[:, mito_genes].X, axis=1).A1 / np.sum(adata.X, axis=1).A1
adata.obs['n_counts'] = adata.X.sum(axis=1).A1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-4b64d0e9fd7f> in <module>
      2 # for each cell compute fraction of counts in mito genes vs. all genes
      3 # the `.A1` is only necessary as X is sparse (to transform to a dense array after summing)
----> 4 adata.obs['percent_mito'] = np.sum(adata[:, mito_genes].X, axis=1).A1 / np.sum(adata.X, axis=1).A1
      5 # add the total counts per cell as observations-annotation to adata
      6 adata.obs['n_counts'] = adata.X.sum(axis=1).A1

c:\users\gsy\miniconda3\second\lib\site-packages\anndata\base.py in __getitem__(self, index)
   1297     def __getitem__(self, index: Index) -> 'AnnData':
   1298         """Returns a sliced view of the object."""
-> 1299         return self._getitem_view(index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':

c:\users\gsy\miniconda3\second\lib\site-packages\anndata\base.py in _getitem_view(self, index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':
-> 1302         oidx, vidx = self._normalize_indices(index)
   1303         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1304 

c:\users\gsy\miniconda3\second\lib\site-packages\anndata\base.py in _normalize_indices(self, index)
   1277         obs, var = super()._unpack_index(index)
   1278         obs = _normalize_index(obs, self.obs_names)
-> 1279         var = _normalize_index(var, self.var_names)
   1280         return obs, var
   1281 

c:\users\gsy\miniconda3\second\lib\site-packages\anndata\base.py in _normalize_index(index, names)
    264         # incredibly faster one
    265         positions = pd.Series(index=names, data=range(len(names)))
--> 266         positions = positions[index]
    267         if positions.isnull().values.any():
    268             raise KeyError(

c:\users\gsy\miniconda3\second\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    906             key = list(key)
    907 
--> 908         if com.is_bool_indexer(key):
    909             key = check_bool_indexer(self.index, key)
    910 

c:\users\gsy\miniconda3\second\lib\site-packages\pandas\core\common.py in is_bool_indexer(key)
    122             if not lib.is_bool_array(key):
    123                 if isna(key).any():
--> 124                     raise ValueError(na_msg)
    125                 return False
    126             return True

ValueError: cannot index with vector containing NA / NaN values
sygongcode commented 5 years ago

Hello All, I am a totally new one in learning single cell RNA_seq. When I was doing quality control, I met this problem. Can anyone help me? Thank you so much

LuckyMD commented 5 years ago

Hi,

Is your mito_genes vector all boolean? And does it have a non-zero sum? You seem to be getting NA values according to the error I guess.

sygongcode commented 5 years ago

Hi,

Is your mito_genes vector all boolean? And does it have a non-zero sum? You seem to be getting NA values according to the error I guess.

Thank you for your reply. I think the 'mito_genes' vector was all boolean. Maybe it has a zero-sum. So, how can I resolve this problem, Do you have any suggestions?

LuckyMD commented 5 years ago

If you have no mitochondrial genes, you can't plot them. The first line of your code looks for genes whose names start with "MT-". For mouse data that should be "mt-", or maybe you have a different nomenclature... or you don't have any mitochondrial genes in your dataset (possible for Cell ranger versions < 2.0).

I can't really debug this, as it requires looking and playing with your dataset.

Good luck!

sygongcode commented 5 years ago

If you have no mitochondrial genes, you can't plot them. The first line of your code looks for genes whose names start with "MT-". For mouse data that should be "mt-", or maybe you have a different nomenclature... or you don't have any mitochondrial genes in your dataset (possible for Cell ranger versions < 2.0).

I can't really debug this, as it requires looking and playing with your dataset.

Good luck!

Thank you for your help. My data is Drosophila data. I think maybe there are no mitochondrial genes in the data. Your reply is very helpful.

zhouyiqi91 commented 1 year ago

The problem is Drosophila gtf has a gene with gene_name as 'nan'.

3L      FlyBase CDS     14187463        14187689        .       -       2       gene_id "FBgn0036414"; transcript_id "FBtr0089524"; exon_number "5"; gene_name "nan"; gene_source "FlyBase"; gene_biotype "protein_coding"; transcript_name "nan-RA"; transcript_source "FlyBase"; transcript_biotype "protein_coding"; protein_id "FBpp0088509";