scverse / mudata

Multimodal Data (.h5mu) implementation for Python
https://mudata.rtfd.io
BSD 3-Clause "New" or "Revised" License
72 stars 16 forks source link

Adding modality to MuData.mod #46

Closed racng closed 2 weeks ago

racng commented 11 months ago

Is your feature request related to a problem? Please describe. After loading a CITE-seq 10x h5 file with muon and 10x vdj file with scirpy, I tried adding the AIRR modality to the existing mdata by adding it to mdata.mod. It seemed to work, since mdata shows that it has 3 modalities. However, when I tried to write the mdata, I get an error:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

However, there is no error if i initialize a new mdata object with all three modalities and save that.

Describe the solution you'd like Would be nice if we can add a modality with mdata.mod['airr'] = adata.copy() without causing warnings.

Describe alternatives you've considered We currently need to create a new object whenever we want to add a new modality

new_mdata = mu.MuData({
    'rna': mdata.mod['rna'].copy(),
    'new': adata.copy(),
    'prot': mdata.mod['prot'].copy()
})

Additional context

mdata = mu.read_10x_h5(gex_path)
adata = ir.io.read_10x_vdj(vdj_path)
mdata.mod['airr'] = adata.copy()
mdata.write('test.h5mu')

Full error message when saving the mdata after adding adata to mdata.mod

/users/rng/mambaforge/envs/compbio/lib/python3.10/site-packages/anndata/_core/anndata.py:1230: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
/users/rng/mambaforge/envs/compbio/lib/python3.10/site-packages/anndata/_core/anndata.py:1230: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
/users/rng/mambaforge/envs/compbio/lib/python3.10/site-packages/anndata/_core/anndata.py:1230: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
gtca commented 10 months ago

Thank you, @racng!

As this seems to be a warning, it shouldn't generally stand in the way of adding modalities.

So far I see that this warning might be related to a scenario when the feature names are duplicated across modalities. In that case .varmap also looks fragmented (e.g. array([1, 0, 2, 0, 3, 0, ...]) and array([0, 1, 0, 2, 0, 3, ...]) for two modalities with the same var_names). This is not the case when creating a MuData object from modalities with these duplicated feature names straight away.

With no name duplicates, there should be no problem like this and no warning!


Version 0.3 of mudata will come with a fix to this warning — together with improved name duplicates handling so that varmap looks better and the behaviour is more intuitive when adding modalities. 🎉

gtca commented 2 weeks ago

This should be warning-free in v0.3 but please feel free to open a new issue if there's something else we can improve.