scverse / muon

muon is a multimodal omics Python framework
https://muon.scverse.org/
BSD 3-Clause "New" or "Revised" License
218 stars 31 forks source link

mu.pp.intersect_obs introduces Nans to mdata.obs #44

Closed crichgriffin closed 2 years ago

crichgriffin commented 3 years ago

Running mu.pp.intersect_obs erroneously introduces NaNs into mdata.obs

Minimal working example:

import numpy as np
import muon as mu

def test_for_nans():
    assert mdata.obs['batch'].isna().sum() == 0

x = mu.AnnData(X=np.random.normal(size=1000).reshape(-1, 10))
y = mu.AnnData(X=np.random.normal(size=1000).reshape(-1, 10))

batches = np.random.choice(["a", "b", "c"], size=100, replace=True)

mdata = mu.MuData({"rna": x, "prot": y})

mdata.obs['batch'] = batches
test_for_nans() # no error

mdata['rna'].obs['total_count'] = mdata['rna'].X.sum(axis=1)
mdata['rna'].obs['min_count'] = mdata['rna'].X.min(axis=1)
mdata.update()

# filter one of the modalities.
mu.pp.filter_obs(mdata['rna'], 'min_count', lambda x: (x < -2))
mu.pp.intersect_obs(mdata)

test_for_nans() # assert is False so it returns an error, in fact all of mdata.obs['batch'] are nans

In my tests above the mdata.obs['batch'] are all NaNs after running intersect obs. Weirdly in bigger datasets, sometimes a really small number of data entries are not NaNs.

System OS: CentOS Linux release 7.8.2003 (Core) Python 3.9.5 Versions of libraries involved numpy 1.20.3 muon 0.1.1 (installed from github today (2021-11-30) using pip install git+https://github.com/gtca/muon )

gtca commented 3 years ago

Hi @crichgriffin,

Thanks a lot for noticing that, this actually seems to be an issue within MuData (rather than with intersect_obs), which we'll actually address with the next update shortly. I will keep this issue open for now and will close it accordingly.

ilia-kats commented 2 years ago

Hi,

please try the latest mudata snapshot from github (pip install git+https://github.com/PMBio/mudata).