Open jdldeauna opened 1 year ago
I just wanted to chime in and say that this could be very helpful for the broader community. Since I found the workaround at some point in the past, I have interacted with many folks who experienced this issue.
Also many thanks for raising this issue @jdldeauna!
Hi @jdldeauna ,
this problem is caused by the data having been published on a grid which still contains the grid halo (rows or columns of duplicated grid cells), which is required during the model run. The grid halo cells are often not properly defined and are collapsing to lines or points and are only sharing the cell centers with their original, which then leads to this ValueError
in ESMF (see #109).
This problem exists for quite a few CMIP ocean models. What you can usually do (besides using ignore_degenerated=True
), is to remove the grid halo before creating remapping weights with xesmf or other tools. Beware that not removing the grid halo can lead to incorrect results when using the conservative remapping method. Also beware that data not defined on the grid cell centers, but rather on the vertices / edges, require special treatment.
Here is an example with CDOs (Climate Data Operators): https://wiki.mpimet.mpg.de/doku.php?id=analysis:postprocessing_mpiesm:regridding (here, GR15 corresponds to MPI-ESM1-2-LR MPIOM; TP04 to MPI-ESM1-2-HR MPIOM)
And here an example for xarray, for MPI-ESM1-2-LR/HR (see also #109 or this notebook):
# Define paths to the data
path_mpiesmlr="./tos_Omon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc"
path_mpiesmhr="./tos_Omon_MPI-ESM1-2-HR_historical_r1i1p1f1_gn_201001-201412.nc"
# Read the data with halo
ds_mpiesmlr_halo=xr.open_dataset(path_mpiesmlr).isel(time=0)
ds_mpiesmhr_halo=xr.open_dataset(path_mpiesmhr).isel(time=0)
# Read the data without halo (LR - omit first and last column, HR - omit first and last column, first 2 rows)
ds_mpiesmlr=xr.open_dataset(path_mpiesmlr).isel(time=0).isel(i=slice(1, 255))
ds_mpiesmhr=xr.open_dataset(path_mpiesmhr).isel(time=0).isel(i=slice(1, 801), j=slice(2,404))
Thank you for the comprehensive reply @sol1105 ! Maybe the error message from xesmf
could be improved to help users figure out these possible workarounds? Just referring to the log file is confusing esp if users are not too familiar with how xesmf
works internally.
This is a really helpful explanation @sol1105. I was not actually aware of this. I have two thoughts:
@jbusecke Please do submit a PR for this, we should be able to give it a quick review.
Happy to work on this next week. Could you assign me to this? Thx
@jbusecke
Do you by any chance know if it is possible to identify whether halo cells are included via the cf-metadata or in some other automated way for many different models?
There is no way to infer that information from the metadata. To see whether duplicate cells are present, one would have to look into the cell coordinates, for example:
(from #109)
# 2D latitude and longitude arrays - create an array of (lat,lon) tuples
latlon_halo=np.array(list(zip(ds["latitude"].values.ravel(),ds["longitude"].values.ravel())),
dtype=('double,double')).reshape(ds["longitude"].values.shape)
# use numpy.unique to identify unique columns
latlon_no_halo,indices=np.unique(latlon_halo, axis=1, return_index=True)
Alternatively, one could remap an array of ones with the generated weights and see if there are any values > 1
in the resulting array, which would hint at duplicate or at least overlapping cells. If so, adaptive masking can be used when remapping the actual data, which renormalizes the results and thus gets rid of double contributions. If one wants to only renormalize contributions > 1
(i.e. duplicated cells) and not <1
(i.e. cells with contributions from masked or out-of-domain cells), one can remap using ..., skipna=True, na_thres=0.)
. Maybe this information could be given to the user if duplicated cells are found.
I would like to include such a 'trimming' step (as you show for the MPI model above) in the CMIP6 specific preprocessing over at xMIP to solve this issue for a broader user base.
I did not find a way so far to perform this trimming in an automated way. The problem is, that the halo cells are often improperly defined (with collapsing or wrong bounds), so one has to select "the right" cell to be removed / masked. So probably notifying the user about duplicated cells in his array and his options to either use adaptive masking or manually remove / mask them, would be the best solution?
Thanks for that explanation @sol1105!
Hi! I was using the
bilinear
method and kept encountering an error message:Error:
Fortunately @jbusecke guided me to a solution:
I was wondering if the error message could be edited to reflect that adding the kwarg
ignore_degenerate=True
resolves it? Its difficult to work out how to resolve the issue with the existingValueError
message. Thanks!