Writing multiple Datasets to disk and then opening with a different chunk size doesn't work as expected.
Only the first Dataset has the requested chunk size, the remainder all have the same chunk size as on disk.
What I Did
Here is a simple reproducer
import xarray as xr
import dask
import dask.array as da
from daskms.experimental.zarr import xds_to_zarr, xds_from_zarr
D = []
for i in range(5):
tmp = da.random.random(size=(12000), chunks=1000)
dv = {
'DATA': ('r', tmp)
}
D.append(xr.Dataset(data_vars=dv))
dask.compute(xds_to_zarr(D, 'test.zarr', columns='ALL'))
xds = xds_from_zarr('test.zarr', chunks={'r': 2000})
print(xds)
which results in
[<xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(2000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>]
Easy enough to fix by rechunking but still think this is a bug.
Description
Writing multiple Datasets to disk and then opening with a different chunk size doesn't work as expected. Only the first Dataset has the requested chunk size, the remainder all have the same chunk size as on disk.
What I Did
Here is a simple reproducer
which results in
Easy enough to fix by rechunking but still think this is a bug.