scverse / muon

muon is a multimodal omics Python framework
https://muon.scverse.org/
BSD 3-Clause "New" or "Revised" License
218 stars 31 forks source link

Datasets module #45

Open ivirshup opened 2 years ago

ivirshup commented 2 years ago

I think it would be useful to add a datasets module. It's very useful for prototyping and debugging.

For prototyping, you've always got a couple extra objects to try a function on. This is also great for downstream library authors.

For debugging: you can quickly grab a dataset no matter what system you're on. And when trying to debug remotely with a user, it's an easy shared source of truth for replication.

gtca commented 2 years ago

We (will) have mudatasets, is this something that would address what you have in mind?

gtca commented 2 years ago

With mudata being a separate library, mudatasets library has also been made to take advantage of that, but we can merge it into muon later — depending on what people think. Do you happen to have more considerations in that regard, @ivirshup (and others)?

ivirshup commented 2 years ago

That works!

I'm not too fussed with where it lives. I could be convinced either way – but using it from the tutorials for muon would definitely aid in visibility.

I do think it can be useful for specific packages to provide their own datasets, since conventions for processing may be package specific. Hopefully unprocessed datasets should be consistent though. Though you could offer more functionality through a separate package.