scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.87k stars 594 forks source link

subsetting isn't working after updating scanpy #363

Open aopisco opened 5 years ago

aopisco commented 5 years ago

subsetting doesn't work: I've this object

>>> tiss
AnnData object with n_obs × n_vars = 29322 × 19860
>>> tiss[tiss.obs['cell_ontology_class']=='B cell']
IndexError                                Traceback (most recent call last)
<ipython-input-269-28b4524131cb> in <module>()
----> 1 tiss[tiss.obs['cell_ontology_class']=='B cell']

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in __getitem__(self, index)
   1299     def __getitem__(self, index):
   1300         """Returns a sliced view of the object."""
-> 1301         return self._getitem_view(index)
   1302 
   1303     def _getitem_view(self, index):

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _getitem_view(self, index)
   1303     def _getitem_view(self, index):
   1304         oidx, vidx = self._normalize_indices(index)
-> 1305         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1306 
   1307     def _remove_unused_categories(self, df_full, df_sub, uns):

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, oidx, vidx)
    662             if not isinstance(X, AnnData):
    663                 raise ValueError('`X` has to be an AnnData object.')
--> 664             self._init_as_view(X, oidx, vidx)
    665         else:
    666             self._init_as_actual(

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _init_as_view(self, adata_ref, oidx, vidx)
    713             raise KeyError('Unknown Index type')
    714         # fix categories
--> 715         self._remove_unused_categories(adata_ref.obs, obs_sub, uns_new)
    716         self._remove_unused_categories(adata_ref.var, var_sub, uns_new)
    717         # set attributes

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _remove_unused_categories(self, df_full, df_sub, uns)
   1318                     uns[k + '_colors'] = np.array(uns[k + '_colors'])[
   1319                         np.where(np.in1d(
-> 1320                             all_categories, df_sub[k].cat.categories))[0]]
   1321 
   1322     def rename_categories(self, key, categories):

IndexError: index 7 is out of bounds for axis 1 with size 7

even though it's part of the set:

>>> set(tiss.obs['cell_ontology_class'])
{'B cell',
 'NA',
 'T cell',
 'dendritic cell',
 'macrophage',
 'natural killer cell'}

it does work for louvain though:

>>> tiss[tiss.obs['louvain']=='0']`
View of AnnData object with n_obs × n_vars = 5862 × 19860`
falexwolf commented 5 years ago

Can you call del adata.uns['cell_ontology_class_colors']? This should throw a better error message... I can do that soon, I wonder how you managed to produce the error... cannot be anything related to a recent update... Hm.

aopisco commented 5 years ago

actually what works is to set the observation to categorical or run tSNE/umap beforehand - it always happens when I'm trying to subset without having run the plots first

aopisco commented 5 years ago

@falexwolf now I can't really get it to work...

tiss_facs.obs['cell_ontology_class'].cat.categories
Index(['NA', 'basal cell of epidermis', 'epidermal cell',
       'keratinocyte stem cell', 'leukocyte', 'stem cell of epidermis'],
      dtype='object')

del tiss_facs.uns['cell_ontology_class_colors']
tiss_facs
AnnData object with n_obs × n_vars = 3468 × 22899 
    obs: 'FACS.selection', 'batch', 'cell', 'cell_ontology_class', 'cell_ontology_id', 'cellid', 'free_annotation', 'method', 'mouse.id', 'plate', 'sex', 'subtissue', 'tissue', 'well', 'n_genes', 'n_counts', 'louvain'
    var: 'n_cells', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'free_annotation_colors', 'louvain', 'louvain_colors', 'method_colors', 'mouse.id_colors', 'neighbors', 'pca', 'rank_genes_groups', 'sex_colors', 'subtissue_colors', 'tissue_colors', 'dendrogram'
    obsm: 'X_pca', 'X_umap', 'X_tsne'
    varm: 'PCs'

tiss_facs[tiss_facs.obs['cell_ontology_class']=='keratinocyte stem cell']
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-82-428532769794> in <module>()
----> 1 tiss_facs[tiss_facs.obs['cell_ontology_class']=='keratinocyte stem cell']

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in __getitem__(self, index)
   1299     def __getitem__(self, index):
   1300         """Returns a sliced view of the object."""
-> 1301         return self._getitem_view(index)
   1302 
   1303     def _getitem_view(self, index):

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _getitem_view(self, index)
   1303     def _getitem_view(self, index):
   1304         oidx, vidx = self._normalize_indices(index)
-> 1305         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1306 
   1307     def _remove_unused_categories(self, df_full, df_sub, uns):

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, oidx, vidx)
    662             if not isinstance(X, AnnData):
    663                 raise ValueError('`X` has to be an AnnData object.')
--> 664             self._init_as_view(X, oidx, vidx)
    665         else:
    666             self._init_as_actual(

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _init_as_view(self, adata_ref, oidx, vidx)
    713             raise KeyError('Unknown Index type')
    714         # fix categories
--> 715         self._remove_unused_categories(adata_ref.obs, obs_sub, uns_new)
    716         self._remove_unused_categories(adata_ref.var, var_sub, uns_new)
    717         # set attributes

~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _remove_unused_categories(self, df_full, df_sub, uns)
   1318                     uns[k + '_colors'] = np.array(uns[k + '_colors'])[
   1319                         np.where(np.in1d(
-> 1320                             all_categories, df_sub[k].cat.categories))[0]]
   1321 
   1322     def rename_categories(self, key, categories):

IndexError: index 6 is out of bounds for axis 1 with size 6
falexwolf commented 5 years ago

There seems to be a collision, with some of the '_color' annotations in .uns. Sorry about this! Unfortunately, like this, I can't reproduce it...

Can you provide a reproducible example? It should be solved if you remove all the '_color' annotations but that's of course no good solution at all. I definitely want to get this fixed.

aopisco commented 5 years ago

@falexwolf do you want a object? the fix was working until today, now even after removing all _color from .uns I still have no success...

Traceback ```pytb --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () 22 axs[i].set_title(c) 23 # plot_boxplot_nonzero(tiss[tiss.obs['cell_ontology_class']==cell,:],geneofinterest,'age',axs[i],show = False) ---> 24 plot_boxplot_cell_fraction(tiss[tiss.obs['auto_cell_ontology_class']==c],geneofinterest,'age',c,axs[i],show = False) 25 print(c + ' is done!') 26 i = i+1 in plot_boxplot_cell_fraction(adata, gene, label, title, ax, show) 1 def plot_boxplot_cell_fraction(adata, gene, label, title, ax, show=True): ----> 2 gene_vals = np.asarray(adata[:, gene].X).flatten() 3 4 labels = ['3m','24m'] 5 # labels = list(set(adata.obs[label])) ~/anaconda3/lib/python3.6/site-packages/anndata/base.py in __getitem__(self, index) 1299 def __getitem__(self, index): 1300 """Returns a sliced view of the object.""" -> 1301 return self._getitem_view(index) 1302 1303 def _getitem_view(self, index): ~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _getitem_view(self, index) 1303 def _getitem_view(self, index): 1304 oidx, vidx = self._normalize_indices(index) -> 1305 return AnnData(self, oidx=oidx, vidx=vidx, asview=True) 1306 1307 def _remove_unused_categories(self, df_full, df_sub, uns): ~/anaconda3/lib/python3.6/site-packages/anndata/base.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, oidx, vidx) 662 if not isinstance(X, AnnData): 663 raise ValueError('`X` has to be an AnnData object.') --> 664 self._init_as_view(X, oidx, vidx) 665 else: 666 self._init_as_actual( ~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _init_as_view(self, adata_ref, oidx, vidx) 691 self._varm = ArrayView(adata_ref.varm[vidx_normalized], view_args=(self, 'varm')) 692 # hackish solution here, no copy should be necessary --> 693 uns_new = deepcopy(self._adata_ref._uns) 694 # need to do the slicing before setting the updated self._n_obs, self._n_vars 695 self._n_obs = self._adata_ref.n_obs # use the original n_obs here ~/anaconda3/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize. ~/anaconda3/lib/python3.6/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, '__setstate__'): 282 y.__setstate__(state) ~/anaconda3/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try: ~/anaconda3/lib/python3.6/copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict ~/anaconda3/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try: ~/anaconda3/lib/python3.6/copy.py in _deepcopy_tuple(x, memo, deepcopy) 218 219 def _deepcopy_tuple(x, memo, deepcopy=deepcopy): --> 220 y = [deepcopy(a, memo) for a in x] 221 # We're not going to put the tuple in the memo, but it's still important we 222 # check for it, in case the tuple contains recursive mutable structures. ~/anaconda3/lib/python3.6/copy.py in (.0) 218 219 def _deepcopy_tuple(x, memo, deepcopy=deepcopy): --> 220 y = [deepcopy(a, memo) for a in x] 221 # We're not going to put the tuple in the memo, but it's still important we 222 # check for it, in case the tuple contains recursive mutable structures. ~/anaconda3/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize. ~/anaconda3/lib/python3.6/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, '__setstate__'): 282 y.__setstate__(state) ~/anaconda3/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try: ~/anaconda3/lib/python3.6/copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict ~/anaconda3/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize. ~/anaconda3/lib/python3.6/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, '__setstate__'): 282 y.__setstate__(state) ~/anaconda3/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try: ~/anaconda3/lib/python3.6/copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict ~/anaconda3/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize. ~/anaconda3/lib/python3.6/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 305 key = deepcopy(key, memo) 306 value = deepcopy(value, memo) --> 307 y[key] = value 308 else: 309 for key, value in dictiter: ~/anaconda3/lib/python3.6/site-packages/anndata/base.py in __setitem__(self, idx, value) 437 else: 438 adata_view, attr_name = self._view_args --> 439 _init_actual_AnnData(adata_view) 440 getattr(adata_view, attr_name)[idx] = value 441 ~/anaconda3/lib/python3.6/site-packages/anndata/base.py in _init_actual_AnnData(adata_view) 355 356 def _init_actual_AnnData(adata_view): --> 357 if adata_view.isbacked: 358 raise ValueError( 359 'You cannot modify elements of an AnnData view, ' ~/anaconda3/lib/python3.6/site-packages/anndata/base.py in isbacked(self) 1195 def isbacked(self): 1196 """``True`` if object is backed on disk, ``False`` otherwise.""" -> 1197 return self.filename is not None 1198 1199 @property ~/anaconda3/lib/python3.6/site-packages/anndata/base.py in filename(self) 1211 want to copy the previous file, use ``copy(filename='new_filename')``. 1212 """ -> 1213 return self.file.filename 1214 1215 @filename.setter AttributeError: 'AnnData' object has no attribute 'file' ```
falexwolf commented 5 years ago

Yes, please upload an object if this persists and I'll fix it. But please also make sure you are using the latest version of anndata. I fixed some stuff around deepcopy after you filed this bug.

aopisco commented 5 years ago

hi @falexwolf, it still isn't working. The problem is with categorical variables, I'm currently doing this before subsetting:

cat_columns = adata.obs.select_dtypes(['category']).columns
adata.obs[cat_columns] = adata.obs[cat_columns].astype(str)
del cat_columns

but it's really annoying, specially when using scvelo. Can you look into it? Also something problematic is that the adata.uns['variable_color'] doesn't delete after you delete adata.obs['variable'] so when you run into the subsetting problem my fix doesn't work if this is the situation and I've to manually delete the columns one by one... perhaps make this part of sanitize_anndata?

flying-sheep commented 5 years ago

@aopisco are you sure it’s the same problem? There’s currently a problem with categorical changes in pandas 0.24, and this issue here has been existing for longer.

aopisco commented 5 years ago

@flying-sheep yes, it's exactly the same problem, with the exactly same error message that only happens when I (or the function) wanna subset an existing adata object

falexwolf commented 5 years ago

Can we get a small AnnData object and a couple of lines of code that allows reproducing the problem? I still wouldn't know how to fix this as I've never experienced it... Sorry about the trouble that you're experiencing!

ivirshup commented 5 years ago

@aopisco, are you still having this issue?

HYsxe commented 5 years ago

Hi @aopisco ! @falexwolf I ran into the same problem but got everything to work by deleting all the unnecessary items in adata.uns.

keep = ['neighbors', ]
keys = list(adata.uns.keys())
for key in keys:
    if key not in keep:
        del adata.uns[key]

I don't get errors anymore but I fear that this might cause other problems I'm currently unaware of.

simonewebb commented 4 years ago

I do get this issue from time to time and @HYsxe 's solution works for me (thanks!). Any idea why this is? Thanks!