Open dstansby opened 1 month ago
+1 Thank you, @dstansby.
Root cause of this is https://github.com/zarr-developers/zarr-python/issues/253, but I'll leave this open as it gives a nice self contained example of the issue.
Note that for OME-zarr the deafult separator is /
, so currently zarr-python v2 will report the wrong size for all OME-zarr arrays 😱
I believe the Chunks initialized
metadata is also incorrect. See example below.
import numpy as np
import zarr
zarr_path = "test.zarr"
data = np.random.randint(0, 2**8, size=(1000, 1000), dtype=np.uint8)
for dimension_separator in [".", "/"]:
zarr.save_array(zarr_path, data, chunks=(100,100), dimension_separator=dimension_separator)
zarr_arr = zarr.open(zarr_path)
print(f"{dimension_separator=}")
print("nbytes_stored:", zarr_arr.nbytes_stored)
print("nchunks_initialized:", zarr_arr.nchunks_initialized)
print()
Output
dimension_separator='.'
nbytes_stored: 1001973
nchunks_initialized: 100
dimension_separator='/'
nbytes_stored: 373
nchunks_initialized: 10
Zarr version
2.18.2
Numcodecs version
0.13.0
Python Version
3.10.4
Operating System
macOS
Installation
conda
Description
When saving an array to disk and loading it again with
dimension_separator="/"
, the number of stored bytes is incorrectly reported. In this case it is just reporting the size of the.zarray
file.Steps to reproduce
Additional output
No response