Open bogovicj opened 2 months ago
watching
I am guessing this happens because we validate the sharding codec against the static shape of the zarr array, not against the shape of the array that codec will receive in the data encoding process, which might depend on any number of array -> array codecs. So to properly validate the sharding codecs, we need to be able to statically resolve the shape of its input, based on all previous codecs. Right now I think we only have the transpose codec to worry about, so this shouldn't be too hard to fix.
edit: we would need to change this routine: https://github.com/zarr-developers/zarr-python/blob/726fdfbf569c144310893440a40ee8ee05e6524e/src/zarr/core/metadata.py#L226-L228
The replacement should probably be a stand-alone function that takes an input shape, dtype, and a list of codecs and internally tracks the shape changes through the chain of codecs.
There is some related discussion to this here: https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Transpose.20codec.20interpretation.
Zarr version
3.0.0a0
Numcodecs version
0.13.0
Python Version
3.12.4
Operating System
Linux
Installation
using pip into conda env
Description
I expect a
ShardingCodec
downstream of aTransposeCodec
to consume the transposed array. As a result, I would expect the inner chunk size would have to be "transposed" in the same way that the array was transposed.If the size of shards/chunks along different dimensions do not share a common factor, there is no way (currently) to save a transposed and sharded array.
The code below reproduces the error.
Steps to reproduce
If instead, we don't transpose the sharding codec's
chunk_shape
, it seems to pass validation but crashes later with aZeroDivisionError
Additional output
No response