scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
581 stars 154 forks source link

How to express `write_attribute(file, name, adata)` when writing to root group? #665

Closed ivirshup closed 2 years ago

ivirshup commented 2 years ago

Question: what should the API be to write an AnnData or MuData to a store without creating a new group?

Currently the API for writting elements is write_elem(group, key, value) which writes value into a new element k in group. How do we specify that we would like to write the element into the current group?

Two cases where we want this are writing an AnnData or MuData, though you could also want this for an arbitrary mapping. I figure the API should either be either passing key=None or key="/". So this could look like:

with h5py.File(pth, "w") as f:
    write_elem(f, None, adata)

An implementation detail here: what do we do if the group we are trying to write to isn't empty?

When a key is passed, we delete anything that previously existed at that key and then write the new element. This doesn't work when it's the root of a store, since you generally can't delete that. Ideally our solution doesn't make working around this complicated.

gtca commented 2 years ago

My initial opinion would be to go with key="/". One might discuss if we want to also accept key="" to follow the existing semantics (key="obsm" to write into the "obsm" group, key="" to write into the current group). The latter might be error-prone though so we can go exclusively with the former.

For the second question, we also have to think if we want this to behave in the same way as anndata behaves currently when writing to a file with content (it does re-write it fully). If e.g. write_elem(f, "obsm", data) overwrites what was in this group group before, we should probably overwrite all the groups with write_elem(f, "/", adata) but also delete extra ones then.

ivirshup commented 2 years ago

"/" may actually break some semantics of hdf5, since it refers to the root group.

f = h5py.File("pbmc.h5ad")
uns = f["uns"]
uns["/"].keys()
<KeysViewHDF5 ['X', 'obs', 'obsm', 'obsp', 'raw', 'uns', 'var', 'varm']>

This works for writing to root of a store, but does not generalize to writing to the current group.

ivirshup commented 2 years ago

Zarr seems to allow "" to refer to the current group, but h5py uses "." (which does make a lot of sense). h5py does not allow "..".

ivirshup commented 2 years ago

I have gone with write_elem(f, "/", adata) on master.