Slow update() when indices in modalities haven't changed

scverse / mudata

Multimodal Data (.h5mu) implementation for Python

BSD 3-Clause "New" or "Revised" License

72 stars 17 forks source link

This issue continues the issue #16.

While a lot of functionality in MuData just as in AnnData cannot be guaranteed in the presence of duplicates in the indices (and many functions will error out), it should be still possible to create an object with such indices (obs_names/var_names).

It might be reasonable not to use expensive joins using multi-level indices when indices haven't changed from the last .update(). For that we will have to remember indices of individual modalities, e.g. in ._mod_index as surfaced in #17. It's unclear if this complexity should be introduced as in most workflows indices are expected to be made unique in the very beginning of the MuData object creation thus bringing down the expected number of uses of a faster .update() in such cases to 0.

scverse / mudata

Slow update() when indices in modalities haven't changed #18