google / xarray-beam

Distributed Xarray with Apache Beam
https://xarray-beam.readthedocs.io
Apache License 2.0
125 stars 7 forks source link

Dimension order produced by Rechunk is opaque and not controllable; mismatch can cause errors from subsequent ChunksToZarr. #94

Open mjwillson opened 4 months ago

mjwillson commented 4 months ago

In a pipeline in which Rechunk is followed by ChunksToZarr, one can run into errors when the dimension order of variables output by Rechunk doesn't match that of the template you pass to ChunksToZarr, resulting in errors like:

ValueError: variable 'geopotential_quantiles' already exists with different dimension names ('hour', 'dayofyear', 'level', 'latitude', 'longitude', 'quantile') != ('level', 'hour', 'dayofyear', 'latitude', 'longitude', 'quantile'), but changing variable dimensions is not supported by to_zarr().

As far as I can tell Rechunk doesn't allow you to control the output dimension order (at least, not on a per-variable basis, which may be necessary to match a given template). An alternative could be to transpose the output template instead to match whatever Rechunk is going to produce, but it's hard to know what that's going to be as well.

As another way around this, it'd be nice if ChunksToZarr could just do the transpose rather than complain if it finds this kind of dimension mismatch (same dimensions in a different order).