pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
183 stars 32 forks source link

Parallel weight generation with Dask #290

Closed charlesgauthier-udm closed 10 months ago

charlesgauthier-udm commented 10 months ago

Implemented parallel weight generation using Dask and xarray's map_blocks. Here is a quick summary: User can pass parallel=True to the Regridder and the weights will be computed in parallel.

Key points:

Examples

Using dask to compute the weights allows for larger-than-memory dataset to be used. Using subsets of the Gridded Population of the World (gpw) and the CORDEX WRF in lambert conformal with a 0.22° resolution (y:281, x:297), we get the following examples:

Comparing serial vs. parallel, the overhead related to dask and map_blocks makes it slower for small datasets, but for bigger datasets we can compare both:

Execution time and memory usage is highly dependent on chunk sizes and the number of cores available. However, by chunking the output dataset, the user can adjust it to a specific problem.

review-notebook-app[bot] commented 10 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

charlesgauthier-udm commented 10 months ago

@huard @aulemahal Looks like moving the para_regrid code outside of __init__ to its own method does not solve the issue of __init__ being too complex..

huard commented 10 months ago

I can live with that.