Open mtvector opened 5 months ago
This does seem useful, thank you!
As you mentioned, it doesn’t integrate well with our versioned IO yet, so it’s really more of a recipe than a PR for now.
This might be a good addition for our “How to” tutorials section.
Are you interested in writing a little notebook?
@flying-sheep Yeah I can try to make a little notebook, I'm also working on a function to overwrite selected fields as well that I can include. Should I just commit it and link you here?
Perhaps our more general solution that based on anndata._io.specs import read_elem
could address this use case as well https://pypi.org/project/cap-anndata/
Interesting! You don’t have to use an internal API for it btw, we’re exporting e.g. anndata.experimental.read_elem
by now.
Should I just commit it and link you here?
yeah, I’ll check if it’s a candidate for inclusion in our tutorial notebooks with few changes and we can go from there.
Wow I wasn't aware of cap-anndata, that seems like a much more robust solution, I think that should probably be an example notebook rather than my hack solution!
On Wed, Jun 12, 2024, 23:59 Philipp A. @.***> wrote:
Should I just commit it and link you here?
yeah, I’ll check if it’s a candidate for inclusion in our tutorial notebooks with few changes and we can go from there.
— Reply to this email directly, view it on GitHub https://github.com/scverse/anndata/issues/1517#issuecomment-2164714888, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGX7H2ZKM2DDYO4NOX47BLTZHE7N7AVCNFSM6AAAAABI6F3CROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRUG4YTIOBYHA . You are receiving this because you authored the thread.Message ID: @.***>
Just stumbled upon this issue, and as it's pretty recent, wanted to give a pointer to yet another experimental approach to only load parts of the data — https://github.com/scverse/shadows. Please give us feedback in case you end up trying it out!
I love it when you start a conversation with an ad hoc approach and end up with several robust, purpose-built solutions :)
I came across a previous issue #436 and couldn't get the dask solution working with my application, so I came up with a somewhat hacky solution to reading only the desired fields from an h5ad into an anndata (not chunking). It works by making a tree of all the fields in the H5, searching the tree for fields matching the ones you want to load, then loading the ancestors and descendants of that field. (see code below). Useful if you want to keep all your data together on disk, but only need to load some fields into memory.
Basically you run:
read_h5ad_backed_selective(model_path / 'p3_adata.h5ad', mode='r', selected_keys=['spliced', 'S_score', 'batch_name', 'var', 'uns', 'X_antipode'])
and get back:
Just wanted to share in case it is useful to someone!