Open racng opened 11 months ago
Thank you, @racng, for the detailed use case description!
Ideally we would stay close to the anndata
's implementation of the backed mode but the interface for what you describe was scrapped there.
Just as in anndata
, there's currently a backed mode in mudata
that might help:
mdata = mudata.read("dataset.h5mu", backed=True)
I can also link related issues that discuss similar challenges in AnnData:
The last one showcases some ongoing work to make the API to read elements public but it's still work in progress. I am also not sure if writing data back on disk is part of that effort.
There's another experimental approach to handle out-of-memory operations with AnnData/MuData objects that you can try — https://github.com/scverse/shadows. It is not a stable library yet but hopefully it can work as a drop-in solution for your case.
Is your feature request related to a problem? Please describe. Reading and writing MuData is a bit slow sometimes. For example, after doing some TCR sequence analyses the MuData takes longer to read/write. Sometimes I added one annotation to
mdata.obs
but then it requires writing all modalities when saving. I appreciate that there is the ability to read and write one specific modality specified likemdata.h5mu/rna
but there is no option to read and write only non-modality related elements likemdata.obs
,mdata.var
,mdata.obsm
, etc. I imagine it could save time in different use cases.Describe the solution you'd like Ability to specify list of modalities to read/write, with the option to give an empty list such that only mdata non-modality related elements are read/written. This could be implemented by an extra argument in existing MuData IO functions.