pangeo-data / rechunker

Disk-to-disk chunk transformation for chunked arrays.
https://rechunker.readthedocs.io/
MIT License
163 stars 25 forks source link

Rechunking arrays with non-uniform chunks #58

Open tomwhite opened 3 years ago

tomwhite commented 3 years ago

Rechunker currently assumes that the input array has uniform chunks (except for the last chunk). This fits well with Zarr inputs, which must have uniform chunk sizes.

There are however some cases where chunks are not uniform, such as Dask arrays that have had filtering applied (producing chunks with different sizes), or concatenated Zarr arrays (as discussed in https://github.com/dask/dask/issues/6745). Should these cases be supported in rechunker?

rabernat commented 3 years ago

Good question. For the original use case--zarr to zarr rechunking--this is not an issue. It only comes up with Dask inputs.

I see three possible options: