Open agstephens opened 3 years ago
The problem with the added time dimension for bounds variables can be avoided using the parameter decode_coords="all"
: ds = xarray.open_mfdataset("/path/to/files/*.nc", decode_coords="all")
However, there is another problem related to xarray.open_mfdataset
:
The encoding dictionary gets lost somewhere during the merging operation of the datasets of the respective files (https://github.com/pydata/xarray/issues/2436).
This leads to problems for example with cf-xarray when trying to detect coordinates or bounds, but also leads to problems related to the time axis encoding apparently (as seen in the linked issue). I managed at least to avoid the problems for cf-xarray bounds and coordinates detection by using the decode functionality of xarray only after the datasets have been read in (leaving however the unnecessary time dimension in place ...):
ds = xarray.open_mfdataset("/path/to/files/*.nc")
ds = xarray.decode_cf(ds, decode_coords="all")
DKRZ are loading CMIP6 into Zarr. Here are some of their experiences with
xarray.open_mfdataset
:One problem with the following line:
Xarray does not interpret the bounds keyword so that the corresponding lat and lon bounds are listed as data variables. That might not cause any problem, but on top of that, xarray adds a time dimension to that variables:
DKRZ used:
From the
xarray
tutorial so that there is no time dimension anymore for thebnds
. They had not includeduse_cftime
, which might cause other problems as I saw now when reconverting it to netCDF.