pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Round tripping Zarr datasets with scaling #9780

Open whatnick opened 1 week ago

whatnick commented 1 week ago

What happened?

Data was loaded into Xarray via OpenDataCube dc.load method. This Xarray was persisted to Zarr using

xx.time.encoding['units'] = "seconds since 1970-01-01 00:00:00"
xx.time.attrs = {}
xx.to_zarr('test.zarr',mode='w',consolidated=True)

When this zarr is loaded back using

ds_z = xr.open_dataset('test.zarr',  engine = 'zarr', consolidated = True)

The round tripped dataset has scaling applied with encodings saved in bands e.g. ds_z.red.encoding as

ds_z.red.encoding
{'chunks': (1, 610, 522),
 'preferred_chunks': {'time': 1, 'y': 610, 'x': 522},
 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
 'filters': None,
 'scale_factor': 0.0001,
 'add_offset': -0.1,
 'dtype': dtype('uint16'),
 'coordinates': 'spatial_ref'}

The values returned do not have the scaling auto-applied.

What did you expect to happen?

Scaling to DataArrays is auto-applied when rounding tripping to-from Zarr via Xarray.

Minimal Complete Verifiable Example

ds.time.encoding['units'] = "seconds since 1970-01-01 00:00:00"
ds.time.attrs = {}
ds.to_zarr('test.zarr',mode='w',consolidated=True)

ds_z = xr.open_dataset('test.zarr',  engine = 'zarr', consolidated = True)

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0] python-bits: 64 OS: Linux OS-release: 5.10.227-219.884.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('C', 'UTF-8') libhdf5: 1.10.10 libnetcdf: 4.9.2 xarray: 2024.10.0 pandas: 2.2.3 numpy: 1.26.4 scipy: 1.14.1 netCDF4: 1.7.2 pydap: 3.5 h5netcdf: 1.4.0 h5py: 3.12.1 zarr: 2.18.3 cftime: 1.6.4.post1 nc_time_axis: 1.4.1 iris: None bottleneck: 1.4.2 dask: 2024.7.1 distributed: 2024.7.1 matplotlib: 3.8.4 cartopy: 0.24.1 seaborn: 0.13.2 numbagg: 0.8.2 fsspec: 2024.10.0 cupy: None pint: 0.24.4 sparse: 0.15.4 flox: 0.9.14 numpy_groupies: 0.11.2 setuptools: 75.3.0 pip: 24.3.1 conda: None pytest: 8.3.3 mypy: None IPython: 8.29.0 sphinx: None
welcome[bot] commented 1 week ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

kmuehlbauer commented 1 week ago

@whatnick Thanks for the report. It would help here if you could provide an MCVE.