roocs / clisops

Climate Simulation Operations
https://clisops.readthedocs.io/en/latest/
Other
21 stars 9 forks source link

inconsistent bounds (lat_bnds etc) after subset operation #224

Closed cehbrecht closed 2 years ago

cehbrecht commented 2 years ago

Description

The CDS team reported issues about inconsistent bounds (lat_bnds, ...) after using the subset operation:

The cdo sinfo command shows warnings on the subset output netcdf file:

Warning (cdf_set_var): Inconsistent variable definition for time_bnds!
Warning (cdf_set_var): Inconsistent variable definition for lat_bnds!
Warning (cdf_set_var): Inconsistent variable definition for lon_bnds!

The CDS users reported some tools have problems with these netcdf files ... like Panoply.

What I Did

I have prepared a notebook to reproduce this issue: https://nbviewer.org/github/roocs/rooki/blob/master/notebooks/tests/test-c3s-cmip6-subset.ipynb

It runs the subset operation on a rook test instance with the latest clisops version 0.9.0.

It shows the cdo sinfo and ncdump -h outputs of the original cmip6 netcdf file, which looks fine.

On the subset output of the same netcdf file have the following issues:

$ cdo sinfo cmip6_subset.nc
Warning (cdf_set_var): Inconsistent variable definition for time_bnds!
Warning (cdf_set_var): Inconsistent variable definition for lat_bnds!
Warning (cdf_set_var): Inconsistent variable definition for lon_bnds!

bounds have unnecessary coordinate "height"

double time_bnds(time, bnds) ; time_bnds:coordinates = "height" ; double lat_bnds(lat, bnds) ; lat_bnds:coordinates = "height" ; double lon_bnds(lon, bnds) ; lon_bnds:coordinates = "height" ;

unnecessary FillValue for height not removed

double height ; height:_FillValue = NaN ;

cehbrecht commented 2 years ago

This issue is also related to #198. The FixValue issue was (partially) fixed already in our 0.9.0 release by PR #204

cehbrecht commented 2 years ago

The unnecessary coordinate at the bounds variables, like:

double lat_bnds(lat, bnds) ;
    lat_bnds:coordinates = "height" ;

... was already reported to xarray by @ellesmith88 : https://github.com/pydata/xarray/issues/5510

A workaround to get rid off these coordinates is provided in xarray: https://github.com/pydata/xarray/pull/5514

For example like this:

ds.lat_bnds.encoding["coordinates"] = None
cehbrecht commented 2 years ago

Workaround?

In the test notebook above I'm applying all mentioned workarounds on the xarray dataset:

ds.time.encoding["_FillValue"] = None
ds.lon.encoding["_FillValue"] = None
ds.lat.encoding["_FillValue"] = None
ds.height.encoding["_FillValue"] = None

ds.lat_bnds.encoding["_FillValue"] = None
ds.lat_bnds.encoding["coordinates"] = None

ds.lon_bnds.encoding["_FillValue"] = None
ds.lon_bnds.encoding["coordinates"] = None

ds.time_bnds.encoding["_FillValue"] = None
ds.time_bnds.encoding["coordinates"] = None

Then I write the dataset as netcdf file:

ds.to_netcdf("/tmp/out.nc")

Both cdo sinfo and ncdump -h seem to be happy with the new netcdf file.

cehbrecht commented 2 years ago

@Zeitsperre @sol1105 @agstephens thoughts?

cehbrecht commented 2 years ago

workaround can be added like this (from PR #204): https://github.com/roocs/clisops/blob/d7a339addf436af0c57a4f2b3e305077ce07fa11/clisops/ops/base_operation.py#L70-L87

agstephens commented 2 years ago

@cehbrecht: It looks like using the example code above (from PR #204) is the best place to clean up the dataset. Hopefully, it will only involve adding a few extra lines of code.

cehbrecht commented 2 years ago

Fixed in clisops by #225. Works also now in daops and rook.