Open thetorpedodog opened 1 year ago
Thanks for opening the issue.
Are you specifically looking to read an object in backed mode, or just read it?
Because you can do:
import anndata as ad, numpy as np
import h5py, io
a = ad.AnnData(np.random.randn(1000, 500))
bio = io.BytesIO()
with h5py.File(bio, 'w') as f:
ad.experimental.write_elem(f, "/", a)
b = ad.experimental.read_elem(f["/"])
We're specifically looking to read the object and interact extensively with the anndata read from the h5ad file. We use backed mode because we neither need nor want the entire thing read into memory at once. And in any case, I would characterize opening the same h5ad file twice as a (minor) bug.
in any case, I would characterize opening the same h5ad file twice as a (minor) bug.
I agree with this one. There’s definitely problem here: #719, #522
h5py
happily reads file-like objects, and anndata kind of partially supports doing so. However, anndata only acceptsos.PathLike
objects. It first opens them withh5py.File
inread_h5ad_backed
:https://github.com/scverse/anndata/blob/a5fc41a09b7ef059860d125d653a777418f6d2be/anndata/_io/h5ad.py#L128
But then passes it to
AnnDataFileManager
:https://github.com/scverse/anndata/blob/a5fc41a09b7ef059860d125d653a777418f6d2be/anndata/_core/anndata.py#L397-L398
which, in the
filename
setter, it tries to extract the path from thePathLike
. For ordinary file-like objects, which are notPathLike
, opening will fail here even though h5ad could handle it. Then, in the remainder of theopen
call, it will pass the extracted path toh5py.File
, meaning that the h5ad file gets reopened from the filesystem.https://github.com/scverse/anndata/blob/a5fc41a09b7ef059860d125d653a777418f6d2be/anndata/_core/file_backing.py#L57-L72
Ideally, only one
h5ad.File
would ever be created, directly from the passed-inPathLike
or file-like object. For starters it would be nice if the same file-like object were passed to both calls.We currently use an ugly monkey-patch to accomplish this:
and then do, for example,