scverse / mudata

Multimodal Data (.h5mu) implementation for Python
https://mudata.rtfd.io
BSD 3-Clause "New" or "Revised" License
72 stars 16 forks source link

RecursionError when constructing an MuData object #62

Closed aksarkar closed 8 months ago

aksarkar commented 8 months ago

Constructing an MuData object from a 10X formatted h5 file fails due to a RecursionError.

The following minimal example uses the 10X provided data set https://cf.10xgenomics.com/samples/cell-arc/2.0.0/10k_PBMC_Multiome_nextgem_Chromium_X/10k_PBMC_Multiome_nextgem_Chromium_X_raw_feature_bc_matrix.h5

import scanpy as sc
import mudata

temp = sc.read_10x_h5('10k_PBMC_Multiome_nextgem_Chromium_Controller_raw_feature_bc_matrix.h5', gex_only=False)
target = {
    'rna': 'Gene Expression',
    'atac': 'Peaks',}
dat = mudata.MuData({
    k: temp[:,temp.var['feature_types'] == target[k]]
    for k in target})

and produces the attached backtrace

All packages were installed on an Amazon EC2 instance running Amazon Linux 2 from conda-forge via mamba.

ilia-kats commented 8 months ago

I think this is because you are trying to create a MuData object from AnnData views instead of real AnnData objects. Try this:

import scanpy as sc
import mudata

temp = sc.read_10x_h5('10k_PBMC_Multiome_nextgem_Chromium_Controller_raw_feature_bc_matrix.h5', gex_only=False)
target = {
    'rna': 'Gene Expression',
    'atac': 'Peaks',}
dat = mudata.MuData({
    k: temp[:,temp.var['feature_types'] == target[k]].copy()
    for k in target})
aksarkar commented 8 months ago

The modified code still throws a RecursionError.

aksarkar commented 8 months ago

It seems that this is a bug in scanpy or anndata rather than mudata, since the following code also throws a RecursionError:

import scanpy as sc

dat = sc.read_10x_h5('10k_PBMC_Multiome_nextgem_Chromium_Controller_raw_feature_bc_matrix.h5', gex_only=True)
assert not dat.is_view
dat.obs['0'] = 0
grst commented 8 months ago

Pandas 2.1.2 broke the view behavior in AnnData, see https://github.com/scverse/anndata/issues/1210

For now, please downgrade pandas to 2.1.1.

aksarkar commented 8 months ago

Reverting pandas to 2.1.1 fixed the issue. Thanks!