Open ikding opened 1 month ago
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
That does look confusing & frustrating.
A workaround would be .drop_encoding()
. (When I'm using xarray for my own work, I generally always do this, but not sure that's necessarily correct...)
@pydata/xarray do we have a view / plan for how to handle encoding? We've discussed removing it in the past. I think it's one of those issues that experienced people can get around, but trip up new users, which means it's generally underweight in how core team & contributors (who are more experienced than average) prioritize issues.
Is there a case for defaulting to dropping encoding by default on read?
What happened?
I would like to concatenate two dataarrays or two datasets, save them to zarr, read them into xarray again at a later time, and have all the coordinates intact.
However, if I concatenate two dataarrays or datasets along a string-based coordinate, if the first dataarray in the concatenation has a coordinate with shorter maximum string length than the second dataarray, saving the concatenated dataset to zarr can truncate the string.
This problem only arises when:
encoding["dtype"]
attributeWhat did you expect to happen?
I expect the xarray dataset that was read from the zarr to have the string coordinate the same as the input dataset that I wrote in, with no truncation.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
The order of the dataarray in
xr.concat
mattered. If you runxr.concat([rgb_raster, long_rgb_raster], dim="band")
, the truncation will happen; but if you runxr.concat([long_rgb_raster, rgb_raster], dim="band")
, it will not.I attempted to unwrap the call chain when we save a dataset or dataarray to zarr:
xarray.DataArray.to_zarr()
function,xarray.DataArray
is implicitly converted toxarray.Dataset
before we save it to zarr: code.xarray.DataArray.to_dataset()
directly with no additional argument, it callsxarray.DataArray._to_dataset_whole()
.._coords
) of the dataarray is implicitly copied to the variable of the dataset. So whatever was in our dataset variables, it came from the ._coords private attr of the datarray.Looks like a similar issue was handled in 2014 (https://github.com/pydata/xarray/issues/217); this bug appears to a corner case of when the
encoding
attr of a particularcoordinate
was left intact during concatenation.Environment