Closed TomAugspurger closed 3 weeks ago
Or maybe the issue is just that zarr-python v3 (and maybe the spec?) don't support this data type yet?
Looking through the spec, I suppose we could support it with the r*
type. I'll look into this a bit.
As a short-term fix, I think we can update https://github.com/zarr-developers/zarr-python/blob/e8800b0d55596fc200d67ef2cb8e6f544dcbb519/src/zarr/core/metadata.py#L642 to skip by bytes
and str
, not just str
.
I do want to spend some type figuring out what the appropriate fillvalue is for np.dtype("S6")
(both in practice and according to the spec). Right now we use np.dtype("S6").type(0)
to get `np.bytes(b"")as the fill value. Unfortunately, *that* has a dtype of
S, not
S6. Is
fill_value.dtype == dtypean invariant we want to hold? So we would need to pad the
fill_value` to be length 6 in this case?
This is now erroring much earlier with a reasonable error:
arr = zarr.open_array(store=store, path="a", shape=(), dtype=np.dtype("S6"))
ValueError: Invalid V3 data_type: |S6
This appears to have been fixed by #2036
There is another issue here, which is the fact that I can't figure out how to actually read arr
!
arr[:]
File ~/gh/zarr-developers/zarr-python/src/zarr/core/indexing.py:76, in err_too_many_indices(selection, shape)
75 def err_too_many_indices(selection: Any, shape: ChunkCoords) -> None:
---> 76 raise IndexError(f"too many indices for array; expected {len(shape)}, got {len(selection)}")
IndexError: too many indices for array; expected 0, got 1
👇
In [8]: np.asarray(arr)
Out[8]: array(b'0', dtype='|S6')
In [9]: np.asarray(arr).item()
Out[9]: b'0'
Zarr version
v3
Numcodecs version
n/a
Python Version
n/a
Operating System
n/a
Installation
v3
Description
Discovered while looking at the xarray unit tests for Zarr.
Steps to reproduce
The
.put
call raises withAdditional output
The
Array.update_attributes
call is a bit of a red herring. That just happens to go throughMetadata.__init__
, which validates the fill value for a given dtype. Only this type we usefill_value=np.bytes(b"")
, which is incompatible with the dtypenp.dtype("S6")
.So I think the shorter reproducer is something like
in other words, maybe(?) our inferred fill value is incorrect for this specific dtype.