xarray-contrib / xarray-regrid

Regridding utility for xarray
https://xarray-regrid.readthedocs.org/
Apache License 2.0
67 stars 6 forks source link

Comparison with xESMF #48

Open maresb opened 1 month ago

maresb commented 1 month ago

Hey, this popped up on my GitHub feed and it looks interesting.

I'm already using xESMF which seems to have been around for much longer. I'm wondering:

  1. Is there any reason for me to prefer xarray-regrid to xESMF?
  2. If so, how can I migrate my xESMF code to xarray-regrid?

Generalizing my personal request to an actionable feature request, it would be helpful if the docs compared xarray-regrid with existing regridders.

Thanks so much for publishing this project!

BSchilperoort commented 1 month ago

Hi Ben,

Thanks for the feedback!

If xESMF works for you there is no reason to move over. However;

then this package could be for you. Note that your regridding has to be from rectilinear -> rectilinear, and not between different Coordinate Reference Systems.


Notebooks comparing xarray-regrid to CDO and xESMF are available on the docs. For example: https://xarray-regrid.readthedocs.io/en/latest/notebooks/benchmarks/benchmarking_conservative.html That should show you how the xESMF and xarray-regrid methods differ (they're quite similar).

If you do try xarray-regrid on your workflow it would be great to hear how it runs better/worse than xESMF (CPU time as well as memory use).

maresb commented 1 month ago

Thanks so much @BSchilperoort for the prompt response!!!

Your points are indeed pretty compelling. I'm not sure exactly when, but I'll probably give this a try at some point, and I'll make sure to report back. Thanks again!

slevang commented 3 weeks ago

Adding my 2C on advantages this package offers:

  1. xesmf always has to generate a large sparse array of weights in serial, which scales like the number of grid points, and is a killer for small jobs. 30s to generate weights on a 1/4deg grid, only to regrid a small array in a few ms is a bummer. Since xarray-regrid limits to rectilinear grids where we can separate each dimension, this step usually feels near instantaneous across the different methods.
  2. Packaging for xesmf has gotten better but is still a hassle due to the ESMF dependency. You need conda
  3. Everything here is built from the modern pangeo stack so is easily modifiable and extensible
  4. Even ignoring the weight generation bottleneck for small data, performance on large and/or chunked datasets ranges from on par to 10x+ faster across different benchmarks after recent enhancements.

The obvious limitation is non-rectilinear grids, where the flexibility of ESMF is hard to beat.

slevang commented 3 weeks ago

An nice example for point 1: trying to regrid a large fixed land surface dataset. Here's the 30 arc second ETOPO geoid, which is 21600x43200:

import xarray as xr
import xarray_regrid

ds = xr.open_dataset(
    "https://www.ngdc.noaa.gov/thredds/dodsC/global/ETOPO2022/30s/30s_geoid_netcdf/ETOPO_2022_v1_30s_N90W180_geoid.nc",
    chunks={},
)

ds = ds.rename(lon="longitude", lat="latitude").drop_vars("crs")

bounds = dict(south=-90, north=90, west=-180, east=180)

target = xarray_regrid.Grid(
    resolution_lat=1,
    resolution_lon=1,
    **bounds,
).create_regridding_dataset()

%timeit ds.regrid.conservative(target);
257 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This is basically an intractable problem for xesmf. I tried using their chunked parallel weight generation scheme and it still ran for 20 minutes then crashed.

dcherian commented 3 weeks ago

To be clear, this benchmarks the weight generation and graph creation, correct? Does it compute smoothly too?

slevang commented 3 weeks ago

Then I have to actually download the file :laughing: . But yes I'll try that

slevang commented 3 weeks ago

ETA 1hr, NCEI server having a bad day I guess. I used xr.ones_like to shortcut.

With -1 chunks, regridding takes about 9s and uses ~15GB of memory. With 1000x1000 chunks, 4s and ~2GB of memory. Pretty good since the data itself is 3.5GB.