scverse / mudata

Multimodal Data (.h5mu) implementation for Python
https://mudata.rtfd.io
BSD 3-Clause "New" or "Revised" License
75 stars 17 forks source link

More AI/ML support? #34

Closed amdqiao closed 1 year ago

amdqiao commented 1 year ago

Since many ML packages support AnnData format already, will MuData developers consider reaching out to ML package developers to make more ML packages load MuData as input? Thanks!

Zethson commented 1 year ago

Ping @adamgayoso who might have a thing or two to say about scvi-tools and MuData support.

gtca commented 1 year ago

@amdqiao Thanks for the question! Are there particular applications or scenarios that would benefit from supporting MuData the most, in your opinion?

As @Zethson mentioned, there's MuData support for some multimodal models in scvi-tools. Newly developed models for single-cell multi-omics are hopefully supporting MuData straightaway — and we're happy to see what we can do from our side to make MuData support more abundant.

amdqiao commented 1 year ago

Yes, I've checked the scvi-tools and found it a useful tool. However, it will be even better if MuData can be supported as the input format for popular ML packages such as scikit-learn.

gtca commented 1 year ago

As scikit-learn provides a generic collection of machine learning methods in Python, I don't think it will or should support more domain-specific data structures. However as it operates on numpy arrays and pandas data frames as well as AnnData/MuData, it is possible to write lean interfaces for the methods implemented in scikit-learn.

If there are methods that are particularly suited for single-cell omics, feel free to suggest and/or contribute them to scanpy, muon or other frameworks!