pinellolab / STREAM

STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
http://stream.pinellolab.org
GNU Affero General Public License v3.0
173 stars 48 forks source link

KeyError when add_metadata() #93

Closed mcsimenc closed 4 years ago

mcsimenc commented 4 years ago

Hi,

First time using STREAM. I am at the beginning of the tutorial for RNA-seq data trying to add metadata to my AnnData object. I read the expression matrix TSV successfully using st.read(). Next I called st.addmetadata() but I get the error below. This is the top of my metadata file:

    label   label_color
AAACCCACATCTCATT-1_1    2   #F8766D
AAACCCAGTCACCACG-1_1    0   #E58700
AAACGAACATGTGCTA-1_1    4   #C99800
AAACGAATCCAAAGGG-1_1    LRC-Atrichoblast    #A3A500
AAACGAATCTAGGAAA-1_1    LRC-Atrichoblast    #A3A500
AAACGCTCAATCACGT-1_1    LRC-Atrichoblast    #A3A500
AAACGCTCACTAGTAC-1_1    Pericycle   #6BB100
AAACGCTGTACCTAGT-1_1    0   #E58700
AAACGCTTCAGTGATC-1_1    4   #C99800

KeyError:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-18-3add3c663ded> in <module>
----> 1 st.add_metadata(adata, file_name = "2020-11-03_metaData.tsv")

~/.conda/envs/stream/lib/python3.7/site-packages/stream/core.py in add_metadata(adata, file_name, delimiter, file_path)
    270             id_cells = np.where(df_metadata['label']==x)[0]
    271             df_metadata.loc[df_metadata.index[id_cells],'label_color'] = adata.uns['label_color'][x]
--> 272     adata.obs = df_metadata.loc[adata.obs.index,:]
    273     return None
    274 

~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    871                     # AttributeError for IntervalTree get_value
    872                     pass
--> 873             return self._getitem_tuple(key)
    874         else:
    875             # we by definition only have the 0th axis

~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1053             return self._multi_take(tup)
   1054 
-> 1055         return self._getitem_tuple_same_dim(tup)
   1056 
   1057     def _get_label(self, label, axis: int):

~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup)
    748                 continue
    749 
--> 750             retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
    751             # We should never have retval.ndim < self.ndim, as that should
    752             #  be handled by the _getitem_lowerdim call above.

~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1097                     raise ValueError("Cannot index with multidimensional key")
   1098 
-> 1099                 return self._getitem_iterable(key, axis=axis)
   1100 
   1101             # nested tuple slicing

~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1035 
   1036         # A collection of keys
-> 1037         keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1038         return self.obj._reindex_with_indexers(
   1039             {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index(['AAACCCACATCTCATT.1_1', 'AAACCCAGTCACCACG.1_1', 'AAACGAACATGTGCTA.1_1',\n       'AAACGAATCCAAAGGG.1_1', 'AAACGAATCTAGGAAA.1_1', 'AAACGCTCAATCACGT.1_1',\n       'AAACGCTCACTAGTAC.1_1', 'AAACGCTGTACCTAGT.1_1', 'AAACGCTTCAGTGATC.1_1',\n       'AAAGAACCAGTCGGAA.1_1',\n       ...\n       'TTTCGATCATGGCTGC.1_3', 'TTTGATCCACTTTATC.1_3', 'TTTGATCGTCCACAGC.1_3',\n       'TTTGATCGTCGTATTG.1_3', 'TTTGATCTCTATTGTC.1_3', 'TTTGGAGTCGAGATAA.1_3',\n       'TTTGGTTCACCATAAC.1_3', 'TTTGTTGAGCGTCAAG.1_3', 'TTTGTTGCACCTGTCT.1_3',\n       'TTTGTTGCATGGTACT.1_3'],\n      dtype='object', length=10685)] are in the [index]"

However I double checked that the cell names in the metadata file are exactly the cell names in the input expression matrix (column names). Do you know why I might be getting this error?

Thanks!

Matt

lucapinello commented 4 years ago

Hey Matt,

Thanks for trying STREAM.

It seems there is a minor mismatch in the strings e.g.

AAACCCACATCTCATT.1_1 vs AAACCCACATCTCATT-1_1

Make sure to use consistent names and it should work.

On Tue, Nov 3, 2020 at 5:01 PM mcsimenc notifications@github.com wrote:

Hi,

First time using STREAM. I am at the beginning of the tutorial for RNA-seq data https://nbviewer.jupyter.org/github/pinellolab/STREAM/blob/master/tutorial/1.1.STREAM_scRNA-seq%20%28Bifurcation%29.ipynb?flush_cache=true trying to add metadata to my AnnData object. I read the expression matrix TSV successfully using st.read(). Next I called st.addmetadata() but I get the error below. This is the top of my metadata file:

label label_color AAACCCACATCTCATT-1_1 2 #F8766D AAACCCAGTCACCACG-1_1 0 #E58700 AAACGAACATGTGCTA-1_1 4 #C99800 AAACGAATCCAAAGGG-1_1 LRC-Atrichoblast #A3A500 AAACGAATCTAGGAAA-1_1 LRC-Atrichoblast #A3A500 AAACGCTCAATCACGT-1_1 LRC-Atrichoblast #A3A500 AAACGCTCACTAGTAC-1_1 Pericycle #6BB100 AAACGCTGTACCTAGT-1_1 0 #E58700 AAACGCTTCAGTGATC-1_1 4 #C99800

KeyError:


KeyError Traceback (most recent call last)

in ----> 1 st.add_metadata(adata, file_name = "2020-11-03_MillerBusch2020_A.thaliana_scRNAseq_periderm_integrated_metaData.tsv") ~/.conda/envs/stream/lib/python3.7/site-packages/stream/core.py in add_metadata(adata, file_name, delimiter, file_path) 270 id_cells = np.where(df_metadata['label']==x)[0] 271 df_metadata.loc[df_metadata.index[id_cells],'label_color'] = adata.uns['label_color'][x] --> 272 adata.obs = df_metadata.loc[adata.obs.index,:] 273 return None 274 ~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key) 871 # AttributeError for IntervalTree get_value 872 pass --> 873 return self._getitem_tuple(key) 874 else: 875 # we by definition only have the 0th axis ~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup) 1053 return self._multi_take(tup) 1054 -> 1055 return self._getitem_tuple_same_dim(tup) 1056 1057 def _get_label(self, label, axis: int): ~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup) 748 continue 749 --> 750 retval = getattr(retval, self.name)._getitem_axis(key, axis=i) 751 # We should never have retval.ndim < self.ndim, as that should 752 # be handled by the _getitem_lowerdim call above. ~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1097 raise ValueError("Cannot index with multidimensional key") 1098 -> 1099 return self._getitem_iterable(key, axis=axis) 1100 1101 # nested tuple slicing ~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis) 1035 1036 # A collection of keys -> 1037 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False) 1038 return self.obj._reindex_with_indexers( 1039 {axis: [keyarr, indexer]}, copy=True, allow_dups=True ~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing) 1252 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr) 1253 -> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing) 1255 return keyarr, indexer 1256 ~/.conda/envs/stream/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing) 1296 if missing == len(indexer): 1297 axis_name = self.obj._get_axis_name(axis) -> 1298 raise KeyError(f"None of [{key}] are in the [{axis_name}]") 1299 1300 # We (temporarily) allow for some missing keys with .loc, except in KeyError: "None of [Index(['AAACCCACATCTCATT.1_1', 'AAACCCAGTCACCACG.1_1', 'AAACGAACATGTGCTA.1_1',\n 'AAACGAATCCAAAGGG.1_1', 'AAACGAATCTAGGAAA.1_1', 'AAACGCTCAATCACGT.1_1',\n 'AAACGCTCACTAGTAC.1_1', 'AAACGCTGTACCTAGT.1_1', 'AAACGCTTCAGTGATC.1_1',\n 'AAAGAACCAGTCGGAA.1_1',\n ...\n 'TTTCGATCATGGCTGC.1_3', 'TTTGATCCACTTTATC.1_3', 'TTTGATCGTCCACAGC.1_3',\n 'TTTGATCGTCGTATTG.1_3', 'TTTGATCTCTATTGTC.1_3', 'TTTGGAGTCGAGATAA.1_3',\n 'TTTGGTTCACCATAAC.1_3', 'TTTGTTGAGCGTCAAG.1_3', 'TTTGTTGCACCTGTCT.1_3',\n 'TTTGTTGCATGGTACT.1_3'],\n dtype='object', length=10685)] are in the [index]" However I double checked that the cell names in the metadata file are exactly the cell names in the input expression matrix (column names). Do you know why I might be getting this error? Thanks! Matt — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe .
mcsimenc commented 4 years ago

Thanks for the fast reply and finding my problem!