pangeo-data / rechunker

Disk-to-disk chunk transformation for chunked arrays.
https://rechunker.readthedocs.io/
MIT License
163 stars 25 forks source link

NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data #96

Closed ghiggi closed 3 years ago

ghiggi commented 3 years ago

I am just reporting a way to go around the error NotImplementedError: Can not use auto rechunking with object dtype. We are unable to estimate the size in bytes of object data which occurs when a dimension of an xr.Dataset contains strings.

Example xr.Dataset

<xarray.Dataset>
Dimensions:  (feature: 9, node: 49152, time: 122712)
Coordinates:
  * feature  (feature) object 'z500' 'z850' 'z1000' ... 'q500' 'q850' 'q1000'
    lat      (node) float64 dask.array<chunksize=(49152,), meta=np.ndarray>
    lon      (node) float64 dask.array<chunksize=(49152,), meta=np.ndarray>
  * time     (time) datetime64[ns] 2005-01-01 ... 2018-12-31T23:00:00
Dimensions without coordinates: node
Data variables:
    data     (time, node, feature) float64 dask.array<chunksize=(72, 49152, 1), meta=np.ndarray>

Solution: ds['feature'] = ds['feature'].astype(str)

I hope someone else will end up here before spending hours hacking zarr store/groups attributes and metadata.