Open forman opened 3 years ago
After debugging we found that zarr.core.Array._encode_chunk()
does not encode chunks, if both compressor
and filters
are missing,
However I could not reproduce our problem with Zarr open/save alone. It seems to occur only when using xarrays.open_zarr()
and xr.Dataset.to_zarr()
. Therefore I seems to be an xarray issue rather than a Zarr one.
This is from a while ago now, sorry it didn't get much attention originally.
To the extent this is still an issue — does passing kwargs to the store allow this to work? This is a new-ish feature of .to_zarr
:
chunkmanager_store_kwargs : dict, optional
Additional keyword arguments passed on to the `ChunkManager.store` method used to store
chunked arrays. For example for a dask array additional kwargs will be passed eventually to
:py:func:`dask.array.store()`. Experimental API that should not be relied upon.
I think plausibly xarray should let this level of customization to work by allowing folks to pass args through to the underlying library, even if it doesn't support it natively.
What happened:
We create
xarray.Dataset
instances usingxr.open_zarr(store)
with custom chunkstore
instances. These will lazily fetch data chunks for data variables from the Sentinel Hub API. For coordinate variableslon
,lat
,time
we use "static" store entries: uncompressed, bytified numpy arrays.Since xarray 0.16.2 and Zarr 2.6.1 this approach doesnt work anymore. When we write datasets opened from such store using
xr.to_zarr(dst_store)
, e.g. with adst_store=s3fs.S3Map()
, we get encoding errors. E.g. for a coordinate arraylon
we get from botocore:(Full traceback is below.) It seems that our static numpy arrays won't be encoded at all, because they are uncompressed. If we use a compressor, it works again. (That's our current workaround.)
What you expected to happen:
Before data is written into a Zarr chunk store, it must be encoded from numpy arrays to bytes. This does not seem to happen if uncompressed data is written, that is, the the Zarr encoding's
compressor
andfilters
are both None.Minimal Complete Verifiable Example:
A minimal, self-contained example is the entire test module test_reprod_27.py of the xcube Sentinel Hub plugin
xcube-sh
.Original issue in the Sentinel Hub xcube plugin is xcube-sh #27.
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Nov 27 2020, 18:58:29) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: de_DE.cp1252 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.1.5 numpy: 1.19.4 scipy: 1.5.3 netCDF4: 1.5.5 pydap: installed h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.3.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: 3.3.3 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.1 conda: None pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.1Traceback:
traceback: