Closed charlesgauthier-udm closed 10 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@huard @aulemahal Looks like moving the para_regrid code outside of __init__
to its own method does not solve the issue of __init__
being too complex..
I can live with that.
Implemented parallel weight generation using Dask and xarray's
map_blocks
. Here is a quick summary: User can passparallel=True
to theRegridder
and the weights will be computed in parallel.Key points:
dataset
ordataarray
given toRegridder
.map_blocks
and dask, especially with the creation of a template formap_blocks
, so for small grids serial weight generation is prefered. Therefore, the default isparallel=False
parallel=True
, an identicalRegridder
object to the serial case is returned. Could possibly add aself.parallel
in theRegridder
to keep knowledge of if it was generated in parallel.Examples
Using dask to compute the weights allows for larger-than-memory dataset to be used. Using subsets of the Gridded Population of the World (gpw) and the CORDEX WRF in lambert conformal with a 0.22° resolution
(y:281, x:297)
, we get the following examples:(y:281, x:297)
--> GPW_subset(lat:5000, lon:5000)
;parallel=False
: memory overflows,parallel=True
:Regridder
created in ~86s on my 4-core machine.parallel=True
I can tackle even bigger datasets: WRF(y:281, x:297)
--> GPW_subset(lat:7000, lon:7000)
:Regridder
created in ~2minsComparing serial vs. parallel, the overhead related to dask and
map_blocks
makes it slower for small datasets, but for bigger datasets we can compare both:(y:281, x:297)
--> GPW_subset(lat:5000, lon:4000)
;parallel=False
:Regridder
created in ~100s,parallel=True
:Regridder
created in ~ 50s. Roughly 2x faster.Execution time and memory usage is highly dependent on chunk sizes and the number of cores available. However, by chunking the output dataset, the user can adjust it to a specific problem.