Closed colganwi closed 3 months ago
@colganwi Before starting out on this path, I wonder if you'd be interested in trying out one of our APIs for reading a bit more cleanly (I noticed you don't use them):
https://anndata.readthedocs.io/en/stable/generated/anndata.experimental.read_elem.html
and
https://anndata.readthedocs.io/en/stable/generated/anndata.experimental.read_dispatched.html
This would also help us implement this here if that still makes sense, since we use both of these internally.
@ilan-gold thanks for suggesting the IO API. Based on some digging I think the best solution may be for me to reimplement the anndata
h5ad and zarr read/write functions using the API. The duplicate code would be fairly minimal:
# Reading
with h5py.File(filename, "r") as f:
d = {}
for k in [
"X",
"obs",
"var",
"obsm",
"varm",
"obsp",
"varp",
"layers",
"uns",
"raw",
"obst",
"vart",
]:
if k in f:
d[k] = ad.experimental.read_elem(f[k])
tdata = td.TreeData(**d)
Does this solution make sense to you? Given anndata's current field constraints the files could only be read by treedata but since the TreeData
object can be converted to the AnnData
I don't think this is a big issue.
Is ad.experimental.read_elem
the most stable way to load this API? I would like to future proof this implementation as much as possible.
Is ad.experimental.read_elem the most stable way to load this API?
Yes we are exporting this a stable API with a deprecation on experimental
so you'll have time to switch.
Does this solution make sense to you?
It does. We are thinking of exporting the list of axes (obsp
, uns
etc.) at some point so stay tuned!
If that's all, feel free to close or open a new issue for a more specific request!
I'm working on an extension of the AnnData object called TreeData which adds two additional fields
obst
andvart
for storingnx.DiGraph
trees for theobs
andvar
axes. The primary use case is single cell lineage tracing experiments where you have a tree relating the cells to each other.The
treedata
package is very lightweight since it inherits most of its functionality fromanndata
.treedata
uses theanndata
h5ad and zarr file formats and I would like to extend theanndata
readers and writers in this way:This solution ensures compatibility with
anndata
and minimizes duplicated code, but unfortunately is not possible with the currentanndata
IO implementation, since additional fields are not allowed in the h5ad and zarr files. I would like to update theanndata
IO functions to allow additional fields (for example h5ad.py#L245 would only parse expected fields) but before submitting a PR I want to get the developers thoughts. This change would have no effect on theanndata
API or structure ofanndata
h5ad and zarr files but would make it easier to extend the file format to include additional fields.