scverse / mudata

Multimodal Data (.h5mu) implementation for Python
https://mudata.rtfd.io
BSD 3-Clause "New" or "Revised" License
72 stars 16 forks source link

Unexpected (?) output from MuData.copy() #33

Closed emdann closed 3 weeks ago

emdann commented 1 year ago

Hi there, not sure whether this is really a bug, but if I make certain changes to a MuData.obs (e.g. removing duplicate columns), the obs in the copy becomes different from the original.

Example

adata = sc.datasets.pbmc3k_processed()
adata_highQ = adata[adata.obs['n_counts'] > 2000].copy()
mdata = mudata.MuData({'full':adata,'highQ':adata_highQ}, axis=0)

## Change obs
mdata.obs = mdata['full'].obs.copy()
mdata.obs.columns
Index(['n_genes', 'percent_mito', 'n_counts', 'louvain'], dtype='object')
mdata_copy = mdata.copy()
mdata_copy.obs.columns
Index(['full:n_genes', 'full:percent_mito', 'full:n_counts', 'full:louvain',
       'highQ:n_genes', 'highQ:percent_mito', 'highQ:n_counts',
       'highQ:louvain', 'n_genes', 'percent_mito', 'n_counts', 'louvain'],
      dtype='object'

I understand this comes from the copy method re-initializing the MuData object, but it leads to breaking code where an exact copy is expected.

System

gtca commented 1 year ago

Hey @emdann,

This stems from the necessity of .update() — and the fact that by default, the columns are copied from individual modalities. We might change this behaviour in v0.3 so that the columns are not copied automatically.

Currently what's expected is that the columns should be the same after running .copy() after .update().

gtca commented 3 weeks ago

This should be fixed by the new API in v0.3 (.update(pull=False)), which will become the default one in the next versions.