ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS
https://dask-ms.readthedocs.io
Other
19 stars 7 forks source link

xds_from_zarr only respects requested chunk size on first Dataset #181

Closed landmanbester closed 2 years ago

landmanbester commented 2 years ago

Description

Writing multiple Datasets to disk and then opening with a different chunk size doesn't work as expected. Only the first Dataset has the requested chunk size, the remainder all have the same chunk size as on disk.

What I Did

Here is a simple reproducer

import xarray as xr
import dask
import dask.array as da
from daskms.experimental.zarr import xds_to_zarr, xds_from_zarr

D = []
for i in range(5):
    tmp = da.random.random(size=(12000), chunks=1000)
    dv = {
        'DATA': ('r', tmp)
    }
    D.append(xr.Dataset(data_vars=dv))

dask.compute(xds_to_zarr(D, 'test.zarr', columns='ALL'))

xds = xds_from_zarr('test.zarr', chunks={'r': 2000})

print(xds)

which results in

[<xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(2000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>]

Easy enough to fix by rechunking but still think this is a bug.

JSKenyon commented 2 years ago

This should be fixed in #182, if you want to try it out @landmanbester.

landmanbester commented 2 years ago

That was quick, thanks. Let me give it a go

landmanbester commented 2 years ago

Confirmed, it's working. Thanks