Open shoyer opened 3 years ago
I think dropping on the first operation is the right thing to do, otherwise reloading might cause surprising issues. Consider this:
In [4]: encoding = {
...: "add_offset": 267.39366454179356,
...: "scale_factor": 0.0006500423894110363,
...: "dtype": np.dtype("int16"),
...: "_FillValue": -32767,
...: }
...: ds = xr.Dataset({"arr": ("x", [270, 280, 290], {}, encoding)})
...: ds
Out[4]:
<xarray.Dataset>
Dimensions: (x: 3)
Dimensions without coordinates: x
Data variables:
arr (x) int64 270 280 290
In [5]: ds.arr[:] = [3, 4, 5]
...: ds.to_netcdf("abc.nc")
...: with xr.open_dataset("abc.nc").load() as loaded:
...: display(loaded)
...: display(loaded.arr)
...:
<xarray.Dataset>
Dimensions: (x: 3)
Dimensions without coordinates: x
Data variables:
arr (x) float32 258.6 259.6 260.6
<xarray.DataArray 'arr' (x: 3)>
array([258.60706, 259.6068 , 260.60724], dtype=float32)
Dimensions without coordinates: x
I tend to do ds["var"].encoding = {}
before saving. See also https://github.com/pydata/xarray/discussions/5407
The
encoding
property onVariable
has always been an awkward part of Xarray's API, and an example of poor separation of concerns. It add conceptual overhead to all uses ofxarray.Variable
, but exists only for the (somewhat niche) benefit of Xarray's backend IO functionality. This is particularly problematic if we consider the possible separation ofxarray.Variable
into a separate package to remove the pandas dependency (https://github.com/pydata/xarray/issues/3981).I think a cleaner way to handle
encoding
would be to move it fromVariable
onto array objects, specifically duck array objects that Xarray creates when loading data from disk. As long as these duck arrays don't "propagate" themselves under array operations but rather turn into raw numpy arrays (or whatever is wrapped), this would automatically resolve all issues around propagatingencoding
attributes (e.g., https://github.com/pydata/xarray/pull/5065, https://github.com/pydata/xarray/issues/1614). And users who don't care aboutencoding
because they don't use Xarray's IO functionality would never need to think about it.