jhardenberg / smmregrid

A compact regridder using sparse matrix multiplication
Apache License 2.0
8 stars 0 forks source link

Speed evaluation of the smmregrid tool #2

Open oloapinivad opened 1 year ago

oloapinivad commented 1 year ago

This issue is to keep track of the speed tests that I have been to see what is the optimal configuration for the regridder based on #1

The tests are based on files on different grids (curvilinear, gaussian, gaussian reduced, lonlat and unstructured) to cover all the possibilities, with 2D files, files with mask (i.e. ocean files) and files with pressure levels. We also tested the access of the entire xarray.Dataset versus working on the single xarray.DataArray. The writing of the NetCDF output is also assessed. All tests are run with conservative remapping.

The tests can be found in the playground notebook, and are based on multiple repetition (usually 20 fo each operation). https://github.com/jhardenberg/smmregrid/blob/devel/extend/playground.ipynb

oloapinivad commented 1 year ago

Commit b08b045f6fbb6cf63d439676b207e702d182181a establish a good starting point:

NVars NRecords CDO SMM (Dataset) SMM (DataArray) SMM (DataArray+NoMask) SMM (Dataset+Write) SMM (DataArray+Write)
onlytos-ipsl.nc 1 (12, 332, 362) 1 0.216799 0.0778903 0.0504366 0.997036 0.77708
tas-ecearth.nc 1 (12, 256, 512) 1 0.226347 0.0900398 0.0631857 1.14144 0.958676
2t-era5.nc 1 (12, 73, 144) 1 0.170659 0.094557 0.0610341 0.845712 0.765937
tos-fesom.nc 1 (12, 126859) 1 0.113976 0.0399258 0.0256824 0.755877 0.623671
ua-ecearth.nc 1 (2, 19, 256, 512) 1 0.398825 0.0677817 0.0441846 1.61761 1.35922
mix-cesm.nc 4 (12, 192, 288) 1 0.549228 0.0688021 0.0452126 1.73817 0.670783
era5-mon.nc 1 (864, 721, 1440) 1 0.825034 0.000651573 0.000439713 1.883 1.0852

Few points:

IMPORTANT: this numbers does not take into account the loading of the data

oloapinivad commented 1 year ago

Numbers are less incredible if we take into account the loading of the data into memory i.e. xarray.load()

NVars NRecords CDO CDO (NoLoad) SMM (Dataset) SMM (DataArray) SMM (DataArray+NoLoad) SMM (DataArray+NoMask) SMM (Dataset+Write) SMM (DataArray+Write)
onlytos-ipsl.nc 1 (12, 332, 362) 1 0.85244 0.468714 0.36703 0.0898978 0.261572 0.451468 0.436607
tas-ecearth.nc 1 (12, 256, 512) 1 0.976596 0.556708 0.498547 0.103215 0.35501 0.583348 0.561232
2t-era5.nc 1 (12, 73, 144) 1 1.00737 0.295552 0.250174 0.0685831 0.163858 0.354942 0.341436
tos-fesom.nc 1 (12, 126859) 1 0.975193 0.354071 0.321254 0.052417 0.226885 0.370075 0.367297
ua-ecearth.nc 1 (2, 19, 256, 512) 1 0.886853 0.623172 0.597676 0.171645 0.476483 0.834928 0.849127
mix-cesm.nc 4 (12, 192, 288) 1 0.828504 0.903087 0.246191 0.0574815 0.172714 1.21837 0.3044
era5-mon.nc 1 (864, 721, 1440) 1 1.00124 0.68279 0.688481 0.338506 0.66355 0.696658 0.705815

We are still faster than CDO for single DataArray, but the speedup is small.

oloapinivad commented 1 year ago

Conversely, we get to very bad scaling when we use dask. This is pretty unclear why.

Workers CDO Dask Compute Dask Load
onlytos-ipsl.nc 0 1.33525 0.802709 0.80034
tas-ecearth.nc 0 1.31848 1.04944 0.976837
onlytos-ipsl.nc 1 1.36808 3.72327 2.29133
tas-ecearth.nc 1 1.36115 2.75875 2.80817
onlytos-ipsl.nc 2 1.3585 4.1103 2.21112
tas-ecearth.nc 2 1.40967 2.85582 2.78464
onlytos-ipsl.nc 8 1.356 7.77573 5.73662
tas-ecearth.nc 8 1.43555 3.41429 2.84304
oloapinivad commented 1 year ago

Last commit in #2 suggest significant improvements. Considering that we are not using dask yet, this can be considered as a success.

NVars NRecords CDO CDO (NoLoad) SMM (Dataset) SMM (DataArray) SMM (DataArray+NoLoad) SMM (Dataset+Write) SMM (DataArray+Write)
onlytos-ipsl.nc 1 (12, 332, 362) 1 0.924445 0.377513 0.347206 0.078164 0.429603 0.431012
tas-ecearth.nc 1 (12, 256, 512) 1 0.927699 0.410739 0.38098 0.0854462 0.468425 0.461594
2t-era5.nc 1 (12, 73, 144) 1 0.924335 0.204548 0.160671 0.0439383 0.253002 0.248918
tos-fesom.nc 1 (12, 126859) 1 1.01328 0.332096 0.32686 0.048309 0.379621 0.372826
ua-ecearth.nc 1 (2, 19, 256, 512) 1 0.898331 0.508237 0.475537 0.156023 0.74989 0.735353
mix-cesm.nc 4 (12, 192, 288) 1 0.886593 0.652437 0.181319 0.0456369 0.92309 0.248312
era5-mon.nc 1 (864, 721, 1440) 1 0.990528 0.757665 0.794592 0.33693 0.822056 0.803797