scverse / mudata

Multimodal Data (.h5mu) implementation for Python
https://mudata.rtfd.io
BSD 3-Clause "New" or "Revised" License
78 stars 17 forks source link

`MuData` with no modalities #65

Open mruffalo opened 6 months ago

mruffalo commented 6 months ago

Hello-

We're publishing some data as MuData in the HuBMAP consortium, and we're also planning to use it to store descriptions and annotations of objects in segmentation masks.

These descriptions might include summaries of what we're describing as "primary measurements": protein expression, gene expression, imaging mass spec intensity, and our WIP implementation collects each type of primary measurement into its own modality in a MuData object.

We recently realized that some segmentation mask descriptions would have no primary measurements, like descriptions of functional tissue units identified in a histology image. We were still hoping to use the MuData format for this, to store the appropriate metadata in .obs, spatial information in .obsm['X_spatial'] and more, but it looks like the MuData constructor doesn't allow instantiating an object with no modalities:

>>> import mudata as md
>>> md.MuData()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mruffalo/.opt/anaconda3/lib/python3.11/site-packages/mudata/_core/mudata.py", line 105, in __init__
    raise TypeError("Expected AnnData object or dictionary with AnnData objects as values")
TypeError: Expected AnnData object or dictionary with AnnData objects as values
>>> md.MuData({})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mruffalo/.opt/anaconda3/lib/python3.11/site-packages/mudata/_core/mudata.py", line 159, in __init__
    self.update()
  File "/Users/mruffalo/.opt/anaconda3/lib/python3.11/site-packages/mudata/_core/mudata.py", line 1199, in update
    self.update_var()
  File "/Users/mruffalo/.opt/anaconda3/lib/python3.11/site-packages/mudata/_core/mudata.py", line 1032, in update_var
    self._update_attr("var", axis=0, join_common=join_common)
  File "/Users/mruffalo/.opt/anaconda3/lib/python3.11/site-packages/mudata/_core/mudata.py", line 534, in _update_attr
    columns_common = reduce(
                     ^^^^^^^
TypeError: reduce() of empty iterable with no initial value

One can adjust an in-memory MuData object, deleting all modalities, and this can be written to disk as .h5mu but not loaded by mudata.read_h5mu.

Is this expected to be supported? It would be rather convenient if so. Thanks!

gtca commented 5 months ago

Hey @mruffalo,

Thanks for an interesting use case! Happy to discuss what we can do here to support it.

Can you help me understand what an "empty" MuData object would mean semantically? As it has been designed around a (fixed) collection of modalities (AnnData objects), things like global dimensions are derived from the dimensions of individual modalities, etc.

Currently one can create an AnnData object with no count matrix:

adata = AnnData(X=None, obs=..., var=....)

Such objects can be used as valid modalities.

Is there anything that a MuData with no modalities that can actually add to this?