scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
577 stars 154 forks source link

AttributeError: 'DataDict' object has no attribute 'copy' #895

Closed carmensandoval closed 1 year ago

carmensandoval commented 1 year ago

Trying to work with an anndata object that I can use with pegasus just fine returns the following error:

import scanpy as sc

adata = sc.read_h5ad('/path/to/my.h5ad')

adata

AnnData object with n_obs × n_vars = 57646 × 33554
    obs: 'n_genes', 'n_counts', 'percent_mito', 'leiden_labels', 'doublet_score', 'pred_dbl', 'dbl_kmeans_', 'timepoint'
    var: 'featureid', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features', 'mean', 'var', 'hvf_loess', 'hvf_rank'
    uns: 'Channel_colors', 'PCs', 'W', '_attr2type', 'genome', 'leiden_labels_colors', 'leiden_resolution', 'modality', 'nmf_err', 'nmf_features', 'norm_count', 'pca', 'pca_features', 'pca_ncomps', 'stdzn_max_value', 'stdzn_mean', 'stdzn_std', 'timepoint_colors', 'uid', 'uns_dict', 'var_dict'
    obsm: 'H', 'X_nmf', 'X_pca', 'X_pca_harmony', 'X_umap', '_tmp_fmat_highly_variable_features', 'pca_harmony_knn_distances', 'pca_harmony_knn_indices', 'pca_knn_distances', 'pca_knn_indices'
    varm: 'de_res'
    layers: 'raw.X.log_norm'
    obsp: 'W_pca', 'W_pca_harmony'

sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [67], line 1
----> 1 sc.pp.filter_cells(adata_2, min_genes=200)
      2 sc.pp.filter_genes(adata_2, min_cells=3)

File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/scanpy/preprocessing/_simple.py:141, in filter_cells(data, min_counts, min_genes, max_counts, max_genes, inplace, copy)
    139     else:
    140         adata.obs['n_genes'] = number
--> 141     adata._inplace_subset_obs(cell_subset)
    142     return adata if copy else None
    143 X = data  # proceed with processing the data matrix

File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/anndata/_core/anndata.py:1264, in AnnData._inplace_subset_obs(self, index)
   1262 else:
   1263     dtype = None
-> 1264 self._init_as_actual(adata_subset, dtype=dtype)

File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/anndata/_core/anndata.py:509, in AnnData._init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
    506         raise ValueError(f"Index of {attr_name} must match {x_name} of X.")
    508 # unstructured annotations
--> 509 self.uns = uns or OrderedDict()
    511 # TODO: Think about consequences of making obsm a group in hdf
    512 self._obsm = AxisArrays(self, 0, vals=convert_to_dict(obsm))
...
File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/anndata/compat/_overloaded_dict.py:130, in OverloadedDict.copy(self)
    129 def copy(self) -> dict:
--> 130     return self.data.copy()

AttributeError: 'DataDict' object has no attribute 'copy'

Any ideas what could be going on and how to fix this?

ivirshup commented 1 year ago

Thanks for the report. That does seem strange.

I think the issue that we're assuming the dict-like values are actually dicts.

Did you literally run:

import scanpy as sc

adata = sc.read_h5ad('/path/to/my.h5ad')
sc.pp.filter_cells(adata, min_genes=200)

Because I don't think that should be able to give you anything in uns that isn't a dict.

Could you please also report some info on your environment? E.g. the output of import session_info; session_info.show(dependencies=True, html=False) from the python session where you get the error.

carmensandoval commented 1 year ago

My apologies -- this happens when converting a pegasus object to anndata.

adata = pg.read_input('../cellbender/SAM24425932/SAM24425932_cellbender_out_filtered.h5')

adata = adata.to_anndata()

sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)

AttributeError: 'DataDict' object has no attribute 'copy'

Saving it first, then reading it with scanpy.read_h5ad works fine.

adata = pg.read_input('../cellbender/SAM24425932/SAM24425932_cellbender_out_filtered.h5')
pg.write_output(adata, 'test.h5ad')

adata = sc.read_h5ad('test.h5ad')
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)

I guess this is more a question for the developers of pegasus, but it does seem like something is missing when converting,

Context: I found myself in this hole because of the inability to read cellbender output with scanpy 1.9.1. Pegasus can load them and save these h5 files without issue, so I was thinking of using it as an importer to be able to use scanpy on those objects.

(I can now load these h5 files from cellbender using the function provided here, but still have issues saving - hence why I'm trying to find a way to convert between the two 'formats'.)

ivirshup commented 1 year ago

Yes, I think this would be an issue for pegasus. It shouldn't be putting DataDicts into AnnData.

You could maybe do something like:

from collections.abc import Mapping

def sanitize_uns(d):
    return {k: sanitize_uns(v) if isinstance(v, Mapping) else v for k, v in d.items()}

adata.uns = sanitize_uns(adata.uns)

Potentially we should be more aggressive with converting on the anndata side. However, Mapping subtypes are quite common it would be easy to convert something that shouldn't be converted.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

flying-sheep commented 1 year ago

I’m closing this because there was no follow-up. Please feel free to respond and we’ll re-open it.