zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.46k stars 274 forks source link

write_empty_chunks and zarr.zeros #2060

Open AlexHenderson opened 1 month ago

AlexHenderson commented 1 month ago

Hi,

The example here https://zarr.readthedocs.io/en/stable/tutorial.html#empty-chunks for not writing empty chunks works fine. However, if I use zarr.zeros rather than zarr.open, empty chunks are still written to file.

import zarr
arr = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4', write_empty_chunks=False)
zarr.save("./testlocation", arr)

./testlocation now contains 101 files although fill_type is 0 by default in zarr.zeros.

zarr.version = 2.18.2 python 3.12.4 installation via poetry

Not sure if this is a bug, or I'm doing something wrong.

Thanks, Alex

d-v-b commented 1 month ago

hi @AlexHenderson, try passing write_empty_chunks=False as a keyword argument to zarr.save.

Your code example is actually creating two separate zarr arrays: the first array is created with the zarr.zeros function, and it uses in-memory storage, because that's the default storage backend if none is specified. zarr.save will create a second array on the file system using one of the file system-based storage backends. That second array is a structurally identical copy of the first array, but write_empty_chunks is a runtime detail, it's not part of the array metadata, so that doesn't get automatically copied over when you create the second array.

you can actually skip the invocation of zarr.save by passing a store keyword argument to zarr.zeros, e.g.

arr = zarr.zeros(store='testlocation', path='my_array', (10000, 10000), chunks=(1000, 1000), dtype='i4', write_empty_chunks=False)