scverse / mudata

Multimodal Data (.h5mu) implementation for Python
https://mudata.rtfd.io
BSD 3-Clause "New" or "Revised" License
72 stars 17 forks source link

Slow update() when var_names are unique #16

Closed gtca closed 2 years ago

gtca commented 2 years ago

The main use case supported by mudata is the one with modalities (AnnData objects) having different and distinct feature names (.var_names). Examples of such datasets include multiome and CITE-seq data.

For this use case, it is reasonable to expect mdata.update() not to increase analysis latency.

While the current implementation of the .update() method handles generic cases with duplicated and/or intersecting .var_names, it uses the same logic for all the scenarios, which makes it subjectively slow for the main use case described above.

While various optimisations of the.update() method would be desirable, this issue tracks the progress of a faster .update() when there are no duplicated or intersecting .var_names.

gtca commented 2 years ago

This is considered to be fixed by #17 and is expected to ship with 0.1.2. Any edge cases arising from this update should be reported in separate issues.