zarr-developers / VirtualiZarr

Create virtual Zarr stores from archival data files using xarray syntax
https://virtualizarr.readthedocs.io/en/stable/api.html
Apache License 2.0
124 stars 24 forks source link

Inconsistent fill_val between ZArray constructor and from_kerchunk_refs() #287

Open ayushnag opened 3 weeks ago

ayushnag commented 3 weeks ago

Original comment: https://github.com/zarr-developers/VirtualiZarr/pull/265#issuecomment-2456072544

There is some code in from_kerchunk_refs which creates a ZArray with fill_val = np.nan as the default value which does not match the default value set in the ZArray constructor. Here are the lines where the issue occurs. This creates a mismatch between the virtual datasets produced by kerchunk and non-kerchunk readers (dmrpp and hdf).

Here is an example:

from virtualizarr.zarr import ZArray
import numpy as np
z = {"shape": (2, 3), "dtype": np.dtype("float32"), "chunks": (2, 3), "compressor": None, "filters": None, "fill_value": None, "order": "C", "zarr_format": 2}
print(ZArray(**z))
print(ZArray.from_kerchunk_refs(z))
ZArray(shape=(2, 3), chunks=(2, 3), dtype=dtype('float32'), fill_value=0.0, order='C', compressor=None, filters=None, zarr_format=2)
ZArray(shape=(2, 3), chunks=(2, 3), dtype=dtype('float32'), fill_value=nan, order='C', compressor=None, filters=None, zarr_format=2)