Open TomAugspurger opened 3 hours ago
Ah I see the potential issue: the Group is opened with zarr_format=None
which defaults to 3. So we have a Group
with v3 metadata, which assumes that all its children will be v3 arrays, which is probably sensible. The fix here is to set zarr_format
in both places:
zarr.open_group(store, zarr_format=2)["foo"]
So probably not a bug.
For top level stuff like zarr.open
, what do you think about automatically discovering the zarr format? We would need to handle the case where both a v2 and v3 node are under the same prefix, but I think this would be a pretty user-friendly and intuitive feature.
I think the current implementation does a good job with auto-discovery when reading. The problem with my snippet is that the zarr.open_group(store)
automatically created a Group
, which falls back to a v3 default.
If the store had already had a (v2) Group then this works:
In [1]: import zarr
In [2]: store = zarr.store.MemoryStore(store_dict={}, mode="w")
In [3]: zarr.open_array(store=store, shape=(4,), path="foo", zarr_format=2)
Out[3]: <Array memory://4537150336/foo shape=(4,) dtype=float64>
In [4]: zarr.open_group(store, zarr_format=2) # create the v2 Group
Out[4]: Group(_async_group=<AsyncGroup memory://4537150336>)
In [5]: zarr.open_group(store)["foo"] # read the v2 Group (relying on zarr_format inference)
Out[5]: <Array memory://4537150336/foo shape=(4,) dtype=float64>
So maybe this question here is to what extent we want to support hierarchies with a mixture of Zarr V2 and Zarr V3 nodes. That makes my head hurt a little bit, but I guess it's doable (don't assume that Group.zarr_format
will match the child zarr format, so don't pass that through in the __getitem__
call and rely on zarr's auto discovery?).
So maybe this question here is to what extent we want to support hierarchies with a mixture of Zarr V2 and Zarr V3 nodes.
I think we want no support for this. IMO, a v3 group with a v2 array as a member should not be expressible in zarr-python
. But I don't think it should be an error to create a v3 and v2 group in the same place. Maybe part of the problem is the open_group
function itself, which by name is ambiguous about whether it creates or reads. I would suggest adding a read_group
function that uses mode=r
under the hood, thereby avoiding surprise group creation. Similarly for read_array
, and an array-or-group read
function. I can open a PR for this when i get some time this week.
Explicit read_*
and create_*
would be fantastic :)
Zarr version
v3
Numcodecs version
na
Python Version
na
Operating System
na
Installation
na
Description
In zarr-v2, you could create an array at a
path
and access it via a group:Steps to reproduce
In zarr v3, this raises a KeyError:
Additional output
No response