Open blimlim opened 1 year ago
To regrid to a coarser grid, I recommend you use the conservative_normed
method, passing the masks for both input and output grids. without mask defined, land values (NaN or zeros) can bleed into the regridded field.
Hi @raphaeldussin - I've just bumped into this issue thread. Thanks to @blimlim for posting and your answer.
WRT the above are their any good, clear examples on using the conservative_normed
method? My experience is that it's possible some xESMF
users are not as careful and thoughtful as @blimlim and just press ahead with bilinear
and use the possibly problematic result? Worst case is errors in coastal cells are not picked up and used as good data?
One challenge might be understanding how to easily generate the "output" mask for the coarser grid of choice. Are there any examples we can point people to specifically on that?
The answer probably is "just go RTF manual" but if there are any specific pages / examples you could point to that would be helpful?
indeed the docs is a good starting point ;) https://xesmf.readthedocs.io/en/latest/notebooks/Masking.html
Also, conservative normed does a better job conserving statistics across resolutions.
Thanks @raphaeldussin and @Thomas-Moore-Creative for your help with this!
Just hoping to confirm, to use the conservative normed regridding, will I need to also supply the corners of the grid cells? I'll try to obtain these.
I'll be using these files as ancillaries for an atmosphere model, and so am thinking it will be important for them to be periodic in longitude. I've read that the periodic parameter is forced to be False when using the conservative normed method. Would you know if this is still correct?
yes to both. there is an issue #28 opened about periodicity and conservative that is yet to be investigated. That might be something that might interest you @blimlim, probably not much code changes necessary here.
@blimlim If your data is on a regular grid (1D lat and 1D lon), then cf-xarray
can guess the bounds for you. It is a dependency of xESMF
so it could even be done automatically under-the-hood.
As for the periodicity issue, from my experience, as long as your bounds are periodic, the result should respect this periodicity. I mean, the west bounds of the westmost cell should be the exact same as the east bound of the eastmost cell. Dependending on the exact coordinates, there is a probability that guessed bounds will not be perfect, in which case model-provided bounds would be the best alternative.
Thanks @raphaeldussin and @aulemahal for your help! I do have access to the grid bounds, so I'll have a go at the conservative normed regridding. At the moment I'm new to the field so still getting my head around everything, but if in the future the periodicity question hasn't yet been looked at, that would be interesting to investigate!
Hey, as a first-time user I was just in the same situation as @blimlim (needing to coarsely regrid SST data containing NaNs), and I feel like I spent a lot of time going down the wrong path before I found a simple and clean way of doing "the right thing".
Like @blimlim, I also started on https://github.com/JiaweiZhuang/xESMF/issues/22. That seems to rank high in search engine results, so I think if we could agree on some official advice, it would be great to crosspost there to ensure people discover the answer soon.
I also spent a lot of time trying to understand the [comparison of regridding algorithms notebook](https://github.com/pangeo-data/xESMF/blob/master/doc/notebooks/Compare_algorithms.ipynb which I think was not so helpful to me since the notebook doesn't address NaN
values, and it fails to provide an example that distinguishes conservative
and conservative_normed
.
Some time later I discovered the masking notebook, and that focuses on this notion of a binary mask
that should be added as a data variable based on the location of the NaN
values of the input data. This is pretty messy, requiring some awkward boilerplate to set up by hand. After going through the steps, I was dismayed that the result was representing NaN
values with 0
.
Then hidden away near the end of the masking notebook I noticed this lovely "Adaptive masking" section. You simply add skipna=True
to your regridder, and it fills in as many pixels as possible. In case you want to have a threshold for the amount of missing data, adjust na_thres
(the default 1.0
means fill in everything possible, while 0.0
propagates all NaN
s).
Thus I believe the standard recommendation should be:
To decrease resolution,
regridder = xesmf.Regridder(ds, ds_coarse_template, method="conservative")
ds_coarse = regridder(
ds,
skipna=True,
na_thres=1.0, # lowering this results in more NaNs in ds_coarse
)
And conservative
should be preferred over bilinear
when decreasing resolution because bilinear
will throw away many of the input points. On the other hand, conservative
considers all the overlapping input points and thus produces a more stable result with less noise.
To increase resolution,
regridder = xesmf.Regridder(ds, ds_fine_template, method="bilinear")
ds_fine = regridder(
ds,
skipna=True,
na_thres=1.0, # lowering this results in more NaNs in ds_fine
)
And bilinear
is preferable to conservative
for increasing resolution because it produces a smoother interpolation.
As far as I can tell, adaptive masking makes conservative_normed
method obsolete since it corresponds to na_thres=1.0
. Furthermore, it seems to me like the whole manual masking process described in the first ¾ of the masking notebook is obsolete.
My advice above so far ignores the patch
method, my impression being that it's an advanced method for producing extra-smooth interpolation for when bilinear
isn't good enough. I also ignore nearest_s2d
and nearest_d2s
which seem too primitive for most use cases.
Is my understanding at least somewhat correct? If so, would it be helpful to update the documentation to make this more modern advice more prominent? (And perhaps merging the masking and comparison notebooks?) Thanks!
Hi everyone!
I have been regridding some global ocean data and was hoping for some clarification on how xESMF handles masks and missing values, as I've noticed some behaviour I don't quite understand. I've read through #22 of the old repository but didn't see this behaviour appear there.
I have data covering the globe on a lon/lat grid, which I want to interpolate to a coarser grid using bilinear interpolation. I only want to use the ocean points, so have tried a few different methods of masking out the land:
1. Supplying an input binary mask to the input grid in the xe.Regridder call: This results in the following:
(Figure 1)
2. Setting all land values to nan, and regridding with no masks supplied to xe.Regridder From what I understand, any output grid points which are next to an nan on the input data will be set to nan. This results in
(Figure 2)
Looking closer at the above two results, the first outputs values at more points, specifically along the western coasts. The difference between the two results is shown below:
(Figure 3)
I was wondering if anyone knows how these extra values are calculated? In #22 of the old repository, it sounds like providing an input mask is equivalent to setting the inputs under the mask to 0, and then regridding. However, if I do this manually without a mask, I get a different result again:
3. Setting land points to 0, and regridding with no masks supplied to xe.Regridder
(Figure 4)
This results in soft edges at the boundaries, which makes sense, however this is quite different to the first plot.
The main thing I'm hoping to find out is how the extra values in Figure 1 compared to Figure 2 are calculated. If they are a result of an interpolation with 0 at the missing points, it might be best for me not not use them in case the values are artificially low.
Let me know if there's any data/code or more info I can provide that would be helpful. I tried to set up a simple artificial example but wasn't able to get the same behaviour with it (methods 1 and 2 gave the same results when I tried using some simple constructed data).
Thank you for your help with this!