Closed jasonb5 closed 3 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
472c04b
) to head (fc17e7a
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hey @lee1043, at our past meeting on 3/13, Jason mentioned having an unmapped_to_nan=True
option for the Regrid2 API. This enables us to have consistency with the xESMF API and keeps plots looking correct when data has missing values. As a reminder, this option applies to output data. For input data, this PR updates the Regrid2 API to use fill_value to align with CDAT's Regrid2 behavior.
Do you agree with this idea? Thanks.
@tomvothecoder I agree, thanks!
Hi @jasonb5, just checking to see if you can get this done by next Tues (4/2)? I would like to release v0.7.0 mid-next week.
I'm currently debugging #615 to see if I can get anywhere.
I confirm this resolves #615.
It looks like the current version of the code in this PR could introduce unexpected behavior (guess it's a bug?) when there are NaN values in the original field and regrid using regrid2.
import os
import xcdat as xc
from pcmdi_metrics.utils import apply_landmask
input_data = os.path.join(
"/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/ts/gn/v20200511",
"ts_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")
target_grid = xc.create_uniform_grid(-90, 89, 1.0, 0, 359, 1.0)
ds = xc.open_dataset(input_data)
# Apply land mask
ds_masked = ds.copy()
da_masked = apply_landmask(ds, "ts", keep_over="ocean")
ds_masked["ts"] = da_masked
ds_masked["ts"].isel(time=0).plot()
# regrid
ds_masked_regrid = ds_masked.regridder.horizontal("ts", target_grid, tool="regrid2")
ds_masked_regrid["ts"].isel(time=0).plot()
ds (raw data):
ds_masked:
ds_masked_regrid (from code in main branch) -- what I expect
ds_masked_regrid (from this branch) -- what I get
@jasonb5 @tomvothecoder do you have any idea?
For the above, it looks like 1e20 is filled for grid with NaN then do the regridding, which results 1e20 over the land and ~1e19 along the coastlines. If I plot using the below command, I get the 3rd plot in the above comment instead of the last one.
ds_masked_regrid["ts"].where(ds_masked_regrid["ts"] < 1e16).isel(time=0).plot()
Hi @jasonb5, just checking to see if you can get this done by next Tues (4/2)? I would like to release v0.7.0 mid-next week.
I'm currently debugging #615 to see if I can get anywhere.
@jasonb5 any updates? Also #615 is fixed. Jiwoo opened PR #634 to get that one in separately if this PR is going to take longer.
We can push back v0.7.0 another week if needed.
@tomvothecoder @lee1043 I'll have time today to finish this up.
Thanks for the update @jasonb5!
@lee1043 Fixed the issue, after regridding the missing values weren't replaced with Nan's again which resulted in the bad plot. There is still potentially an issue with coastline values ~1e19 so you still need ds_masked_regrid["ts"].where(ds_masked_regrid["ts"] < 1e16).isel(time=0).plot()
to clean up the plot. I'm looking into this, might be a separate issue with regrid2.
@jasonb5 thank you very much for following up and for the update. I test with the latest code in this PR, which has changed 1e20 to nan. However, as you mentioned, there still are issues with coastlines that still concerns me. I think that indicates some of the original information is missed by averaging with 1e20, occurring along the coastlines in this case. Having ~1e19 numbers cause substantial impact on the statistical numbers when conducting model evaluation. I can exclude them by taking < 1e16, but still, that make the coastline grids as blank, which influences to the statistics. Is there any chance to make it those 1e20 or nan to be completely excluded before the regrinding process starts, so they won't affect to the regridded output at all?
ds_masked_regrid (from this branch) -- what I get
By applying ds_masked_regrid["ts"].where(ds_masked_regrid["ts"] < 1e16).isel(time=0).plot()
@lee1043 I'm investigating that issue how, I'll probably update this PR to just include a few fixes.
@jasonb5 thank you for following up. I opened PR #635 for my proposal on this matter. Could you please take a look? Please feel free to discard or merge it to this PR if that looks okay to you.
@lee1043 I added another fix, your example still displays the same issue.
The following example works:
import os
import xcdat as xc
import numpy as np
from pcmdi_metrics.utils import apply_landmask, create_land_sea_mask
input_data = os.path.join(
"/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/ts/gn/v20200511",
"ts_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")
target_grid = xc.create_uniform_grid(-90, 89, 1.0, 0, 359, 1.0)
ds = xc.open_dataset(input_data)
# Apply land mask
ds_masked = ds.copy()
mask = create_land_sea_mask(ds_masked)
# invert mask to keep ocean
ds_masked["mask"] = np.absolute(mask - 1)
# ds_masked["psl"].isel(time=0).plot()
# regrid
ds_masked_regrid = ds_masked.regridder.horizontal("ts", target_grid, tool="regrid2")
ds_masked_regrid["ts"].isel(time=0).plot()
I'm still looking into why your example still fails.
@lee1043 I added another fix, your example still displays the same issue.
The following example works:
import os import xcdat as xc import numpy as np from pcmdi_metrics.utils import apply_landmask, create_land_sea_mask input_data = os.path.join( "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/ts/gn/v20200511", "ts_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc") target_grid = xc.create_uniform_grid(-90, 89, 1.0, 0, 359, 1.0) ds = xc.open_dataset(input_data) # Apply land mask ds_masked = ds.copy() mask = create_land_sea_mask(ds_masked) # invert mask to keep ocean ds_masked["mask"] = np.absolute(mask - 1) # ds_masked["psl"].isel(time=0).plot() # regrid ds_masked_regrid = ds_masked.regridder.horizontal("ts", target_grid, tool="regrid2") ds_masked_regrid["ts"].isel(time=0).plot()
I'm still looking into why your example still fails.
Hi @jasonb5, thanks for following up. But I don't think the code in this comment is working. This is what I get which still has coastline issue. Seeing the result plot below, on top of colorbar color range indicates 1e19, which appears along coastline.
It becomes more clear when plotting where >1e10
ds_masked_regrid["ts"].where(ds_masked_regrid["ts"]>1e10).isel(time=0).plot()
I think the key issue here is that 1e20 is being used when interpolating.
In this PR, logic of the code is checking if there is attributes for FillValue or missing value, and if so, 1e20 is injected instead of NaN in the input variable (here). This input variable is passed to regrid2 (here), which include 1e20 when actually interpolating values.
Instead, I think inclusion of 1e20 when interpolating can be prevented by replacing 1e20 or any missing value if exists (nan_replace
in the code) to actual NaN (here), before calling the regridding. I proposed this solution in #635. Could you please take a look and let me know if there is any concern?
@lee1043 Merged your change, and added a new one. Both pre-applied mask and passing mask as a variable should both work now.
Also note that how the mask is applied by cdms2 and passing the mask as a variable to xcdat differs from pcmdi_metrics apply_landmask
function. The prior allows more data through, while the latter is tighter around the coastlines (not sure if this is a problem).
import os
import xcdat as xc
import numpy as np
import xarray as xr
import cdms2
import matplotlib.pyplot as plt
from pcmdi_metrics.utils import apply_landmask, create_land_sea_mask
fig, axs = plt.subplots(nrows=3, ncols=2, figsize=(10,10))
input_data = "ts_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc"
target_grid = xc.create_uniform_grid(-90, 89, 1.0, 0, 359, 1.0)
cdms2_grid = cdms2.createUniformGrid(-90, 180, 1.0, 0, 360, 1.0)
ds = xc.open_dataset(input_data)
cdms2_ds = cdms2.open(input_data)
cdms2_var = cdms2_ds("ts")
mask = create_land_sea_mask(ds)
# cdms2 automatically inverts the mask
cdms2_mask = mask.values
cdms2_regrid = cdms2_var.regrid(cdms2_grid, mask=cdms2_mask, regridTool="regrid2")
cdms2_regrid = xr.DataArray(cdms2_regrid.data, dims=("time", "lat", "lon"))
cdms2_regrid = cdms2_regrid.where(cdms2_regrid!=1e20)
cdms2_regrid_t0 = cdms2_regrid.isel(time=0)
cdms2_regrid_t0.plot(ax=axs[0,0])
(cdms2_regrid_t0 - cdms2_regrid_t0).plot(ax=axs[0,1])
ds1 = ds.copy()
ds1_masked = apply_landmask(ds1, "ts", keep_over="ocean")
ds1["ts"] = ds1_masked
ds1_regrid = ds1.regridder.horizontal("ts", target_grid, tool="regrid2")
ds1_regrid_t0 = ds1_regrid["ts"].isel(time=0)
ds1_regrid_t0.plot(ax=axs[1,0])
# replace nan with 1e20 to show where data is missing
(ds1_regrid_t0.fillna(1e20) - cdms2_regrid_t0.fillna(1e20)).plot(ax=axs[1,1])
ds2 = ds.copy()
# need to manually invert the mask for ocean
ds2["mask"] = np.absolute(mask - 1)
ds2_regrid = ds2.regridder.horizontal("ts", target_grid, tool="regrid2")
ds2_regrid_t0 = ds2_regrid["ts"].isel(time=0)
ds2_regrid_t0.plot(ax=axs[2,0])
# replace nan with 1e20 to show where data is missing
(ds2_regrid_t0.fillna(1e20) - cdms2_regrid_t0.fillna(1e20)).plot(ax=axs[2,1])
plt.tight_layout()
plt.draw()
Hi @jasonb5, thank you for the update. I checked the updated code in this PR works well for my example code above.
Also note that how the mask is applied by cdms2 and passing the mask as a variable to xcdat differs from pcmdi_metrics
apply_landmask
function. The prior allows more data through, while the latter is tighter around the coastlines (not sure if this is a problem).
Thank you for pointing out. It is kind of intended behavior that in PMP I made the apply landmask function to be more strict when selecting either land or ocean grid. Default criteria for land is land fraction >= 0.8 while for the ocean < 0.2. I might have to rethink on this more to make it more consistent to what CDAT does, to keep minimal impact of xcdat conversion in PMP results.
@lee1043 Sounds good, just wanted to make a note.
@tomvothecoder I think this is good to merge, and I'll tackle #625 in the next release.
@jasonb5 Great! I'll do a quick final review and then merge.
I'm going to add the codecov token to the repo and GitHub Actions build to hopefully fix the codecov upload problem and prevent/limit future occurrences.
https://community.codecov.com/t/upload-issues-unable-to-locate-build-via-github-actions-api/3954
Description
_FillValue
,missing_value
, or default to 1e20Todo
Checklist
If applicable: