pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

Dataset.encode_cf function #4412

Open eric-czech opened 4 years ago

eric-czech commented 4 years ago

I would like to be able to apply CF encoding to an existing DataArray (or multiple in a Dataset) and then store the encoded forms elsewhere. Is this already possible?

More specifically, I would like to encode a large array of 32-bit floats as 8-bit ints and then write them to a Zarr store using rechunker.

I'm essentially after this https://github.com/pangeo-data/rechunker/issues/45 (Xarray support in rechunker), but I'm looking for what functionality exists in Xarray to make it possible in the meantime.

dcherian commented 4 years ago

Not at the moment.

I think we should add an xr.encode_cf that wraps conventions.cf_encoder (this may have already come up in the "flexible backends" discussions). This would parallel xr.decode_cf

https://github.com/pydata/xarray/blob/66259d1853b85590bfbf6640fdfb868843812312/xarray/conventions.py#L740-L793

It'll also need to wrap this logic: https://github.com/pydata/xarray/blob/66259d1853b85590bfbf6640fdfb868843812312/xarray/backends/api.py#L1113-L1127

For simple use cases, you could write a small wrapper for .cf_encoder that takes datasets and returns datasets and it should work just fine (Look at conventions.decode_cf).

eric-czech commented 4 years ago

Ok thanks @dcherian! I'll try that (feel free to close this).

dcherian commented 1 year ago

Related request for to_zarr(..., encode_cf=False): https://github.com/pydata/xarray/issues/5405

This came up in the discussion today.

cc @tom-white @kmuehlbauer