Closed eric-czech closed 3 years ago
The short answer is that you should delete encoding['chunks']
here for now. This operation is totally safe, but Xarray is mistakenly trying to re-use the chunks from the source dataset when writing the data to disk.
The long answer is that we really should fix this in Xarray. See https://github.com/pydata/xarray/issues/5219 for discussion.
Thanks @shoyer, good to know!
I'll close this now that it's linked to #5219. Thanks @eric-czech & @shoyer
Would it be possible to get an explanation on how this situation results in a zarr chunk overlapping multiple dask chunks?
This code below is generating an array with 2 chunks, selecting one row from each chunk, and then writing that resulting two row array back to zarr. I don't see how it's possible in this case for one zarr chunk to correspond to different dask chunks. There are clearly two resulting dask chunks, two input zarr chunks, and a correspondence between them that should be 1 to 1 ... what does this error message really mean then?
Also what is the difference between "deleting or modifying
encoding['chunks']
" and "specifysafe_chunks=False
"? That wasn't clear to me in https://github.com/pydata/xarray/issues/5056.Lastly and most importantly, can data be corrupted when using parallel zarr writes and just deleting
encoding['chunks']
in these situations?Environment:
Output of xr.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-cloud-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.18.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.3 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.04.1 distributed: 2021.04.1 matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.1 conda: None pytest: 6.2.4 IPython: 7.23.1 sphinx: None