google / xarray-beam

Distributed Xarray with Apache Beam
https://xarray-beam.readthedocs.io
Apache License 2.0
125 stars 7 forks source link

Require using make_template() if providing a template to ChunksToZarr? #59

Closed shoyer closed 1 year ago

shoyer commented 1 year ago

Currently we support passing an xarray.Dataset full of chunked dask.array objects as template into ChunksToZarr.

This is convenient in simple cases, but makes it easy to write pipelines that are super slow to setup, if you pass in a chunked Dataset with many small chunks (e.g., the default output of xarray.open_zarr()).

The breaking change here would be to require that the template argument was created via make_template(), by checking that each dask.array argument in the supplied Dataset only consists of a single chunk. We would also make zarr_chunks required when supplying a template, because it makes no sense to copy chunks from a template if using make_template.

shoyer commented 1 year ago

As an alternative, we could instead perhaps use make_template() internally inside ChunksToZarr.

shoyer commented 1 year ago

As an alternative, we could instead perhaps use make_template() internally inside ChunksToZarr.

I implemented this in https://github.com/google/xarray-beam/pull/62