Open maresb opened 2 months ago
Hi Ben,
Thanks for the feedback!
If xESMF works for you there is no reason to move over. However;
then this package could be for you. Note that your regridding has to be from rectilinear -> rectilinear, and not between different Coordinate Reference Systems.
Notebooks comparing xarray-regrid to CDO and xESMF are available on the docs. For example: https://xarray-regrid.readthedocs.io/en/latest/notebooks/benchmarks/benchmarking_conservative.html That should show you how the xESMF and xarray-regrid methods differ (they're quite similar).
If you do try xarray-regrid on your workflow it would be great to hear how it runs better/worse than xESMF (CPU time as well as memory use).
Thanks so much @BSchilperoort for the prompt response!!!
Your points are indeed pretty compelling. I'm not sure exactly when, but I'll probably give this a try at some point, and I'll make sure to report back. Thanks again!
Adding my 2C on advantages this package offers:
xesmf
always has to generate a large sparse array of weights in serial, which scales like the number of grid points, and is a killer for small jobs. 30s to generate weights on a 1/4deg grid, only to regrid a small array in a few ms is a bummer. Since xarray-regrid
limits to rectilinear grids where we can separate each dimension, this step usually feels near instantaneous across the different methods.xesmf
has gotten better but is still a hassle due to the ESMF dependency. You need condaThe obvious limitation is non-rectilinear grids, where the flexibility of ESMF is hard to beat.
An nice example for point 1: trying to regrid a large fixed land surface dataset. Here's the 30 arc second ETOPO geoid, which is 21600x43200:
import xarray as xr
import xarray_regrid
ds = xr.open_dataset(
"https://www.ngdc.noaa.gov/thredds/dodsC/global/ETOPO2022/30s/30s_geoid_netcdf/ETOPO_2022_v1_30s_N90W180_geoid.nc",
chunks={},
)
ds = ds.rename(lon="longitude", lat="latitude").drop_vars("crs")
bounds = dict(south=-90, north=90, west=-180, east=180)
target = xarray_regrid.Grid(
resolution_lat=1,
resolution_lon=1,
**bounds,
).create_regridding_dataset()
%timeit ds.regrid.conservative(target);
257 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This is basically an intractable problem for xesmf. I tried using their chunked parallel weight generation scheme and it still ran for 20 minutes then crashed.
To be clear, this benchmarks the weight generation and graph creation, correct? Does it compute smoothly too?
Then I have to actually download the file :laughing: . But yes I'll try that
ETA 1hr, NCEI server having a bad day I guess. I used xr.ones_like
to shortcut.
With -1 chunks, regridding takes about 9s and uses ~15GB of memory. With 1000x1000 chunks, 4s and ~2GB of memory. Pretty good since the data itself is 3.5GB.
Hey, this popped up on my GitHub feed and it looks interesting.
I'm already using xESMF which seems to have been around for much longer. I'm wondering:
Generalizing my personal request to an actionable feature request, it would be helpful if the docs compared xarray-regrid with existing regridders.
Thanks so much for publishing this project!