Open antoinegaston opened 4 days ago
Hi @antoinegaston could you elaborate a bit your use-case? From what I can see, this seems like quite an unsafe operation.
f = zarr.open(store, mode="w")
f.attrs.setdefault("encoding-type", "anndata")
f.attrs.setdefault("encoding-version", "0.1.0")
you open the store and encode the anndata version/type at the root.
write_dispatched(f, f"/{path}", adata, callback=callback, dataset_kwargs=ds_kwargs)
then you write out to a different location? how would you read this back in? Just want to understand! Like, why not just pass in the store at the location you want it?
Hello @ilan-gold thank you for your comment, indeed I missed to pass the path to path parameter in the zarr.open
:
f = zarr.open(store, mode="w", path=path)
To give you more context about the use case, we have a zarr store in which we store not only anndata but other things as well so we wanted to be able to do so without having to create multiple stores targeting the different subpath. We want to keep the flexibility to use choose the kind of store tho' without having to multiply the number of parameters to pass to our processing function. It's just the idea of passing path
parameter from zarr.open
through the write_zarr
method.
@antoinegaston But I believe you can pass in a store of your own into write_zarr
as things stand, no? So you could use fsspec
to create a store at a location and then pass that in?
Yes you are write, the issue is that in our case we have a global store that is an ABSStore and we create a root group in it in which we create some other groups and where we want to write our anndata object as a group as well. The thing is that the store of all those groups is still the global one and you cannot specifies the path directly to write_zarr
as it's a path within a remote storage. Tell me if it's unclear.
Does something like (not exactly, perhas)
# Combine the original store path with the sub-path
new_path = f"{original_store.path}/{sub_path}"
# Open a new ABSStore at the sub-path
new_store = ABSStore(container_name=new_path)
not work?
A short example would be clarifying.
It does the trick indeed but it's not always an ABSStore, it depends on the type of the global parent store. It can be DirectoryStore as well in some situations.
# Combine the original store path with the sub-path
new_path = f"{original_store.path}/{sub_path}"
# Open a new ABSStore at the sub-path
new_store = type(OriginalStore)(container_name=new_path)
Or similar. I don't know, I don't think adding an argument here makes sense. I think the solution here would be to allow passing a zarr.Group
if that doesn't already work (which it very well might - I think zarr.open
is idempotent)
Please describe your wishes and possible alternatives to achieve the desired result.
This feature would allow to write the AnnData object to a specific path in a zarr store. It requires very slight changes:
In
anndata/_io/zarr.py
firstIn
anndata/_core/anndata.py
:And finally adding a small test to
test_readwrite.py
: