Open oloapinivad opened 1 year ago
Commit b08b045f6fbb6cf63d439676b207e702d182181a establish a good starting point:
NVars | NRecords | CDO | SMM (Dataset) | SMM (DataArray) | SMM (DataArray+NoMask) | SMM (Dataset+Write) | SMM (DataArray+Write) | |
---|---|---|---|---|---|---|---|---|
onlytos-ipsl.nc | 1 | (12, 332, 362) | 1 | 0.216799 | 0.0778903 | 0.0504366 | 0.997036 | 0.77708 |
tas-ecearth.nc | 1 | (12, 256, 512) | 1 | 0.226347 | 0.0900398 | 0.0631857 | 1.14144 | 0.958676 |
2t-era5.nc | 1 | (12, 73, 144) | 1 | 0.170659 | 0.094557 | 0.0610341 | 0.845712 | 0.765937 |
tos-fesom.nc | 1 | (12, 126859) | 1 | 0.113976 | 0.0399258 | 0.0256824 | 0.755877 | 0.623671 |
ua-ecearth.nc | 1 | (2, 19, 256, 512) | 1 | 0.398825 | 0.0677817 | 0.0441846 | 1.61761 | 1.35922 |
mix-cesm.nc | 4 | (12, 192, 288) | 1 | 0.549228 | 0.0688021 | 0.0452126 | 1.73817 | 0.670783 |
era5-mon.nc | 1 | (864, 721, 1440) | 1 | 0.825034 | 0.000651573 | 0.000439713 | 1.883 | 1.0852 |
Few points:
IMPORTANT: this numbers does not take into account the loading of the data
Numbers are less incredible if we take into account the loading of the data into memory i.e. xarray.load()
NVars | NRecords | CDO | CDO (NoLoad) | SMM (Dataset) | SMM (DataArray) | SMM (DataArray+NoLoad) | SMM (DataArray+NoMask) | SMM (Dataset+Write) | SMM (DataArray+Write) | |
---|---|---|---|---|---|---|---|---|---|---|
onlytos-ipsl.nc | 1 | (12, 332, 362) | 1 | 0.85244 | 0.468714 | 0.36703 | 0.0898978 | 0.261572 | 0.451468 | 0.436607 |
tas-ecearth.nc | 1 | (12, 256, 512) | 1 | 0.976596 | 0.556708 | 0.498547 | 0.103215 | 0.35501 | 0.583348 | 0.561232 |
2t-era5.nc | 1 | (12, 73, 144) | 1 | 1.00737 | 0.295552 | 0.250174 | 0.0685831 | 0.163858 | 0.354942 | 0.341436 |
tos-fesom.nc | 1 | (12, 126859) | 1 | 0.975193 | 0.354071 | 0.321254 | 0.052417 | 0.226885 | 0.370075 | 0.367297 |
ua-ecearth.nc | 1 | (2, 19, 256, 512) | 1 | 0.886853 | 0.623172 | 0.597676 | 0.171645 | 0.476483 | 0.834928 | 0.849127 |
mix-cesm.nc | 4 | (12, 192, 288) | 1 | 0.828504 | 0.903087 | 0.246191 | 0.0574815 | 0.172714 | 1.21837 | 0.3044 |
era5-mon.nc | 1 | (864, 721, 1440) | 1 | 1.00124 | 0.68279 | 0.688481 | 0.338506 | 0.66355 | 0.696658 | 0.705815 |
We are still faster than CDO for single DataArray, but the speedup is small.
Conversely, we get to very bad scaling when we use dask
. This is pretty unclear why.
Workers | CDO | Dask Compute | Dask Load | |
---|---|---|---|---|
onlytos-ipsl.nc | 0 | 1.33525 | 0.802709 | 0.80034 |
tas-ecearth.nc | 0 | 1.31848 | 1.04944 | 0.976837 |
onlytos-ipsl.nc | 1 | 1.36808 | 3.72327 | 2.29133 |
tas-ecearth.nc | 1 | 1.36115 | 2.75875 | 2.80817 |
onlytos-ipsl.nc | 2 | 1.3585 | 4.1103 | 2.21112 |
tas-ecearth.nc | 2 | 1.40967 | 2.85582 | 2.78464 |
onlytos-ipsl.nc | 8 | 1.356 | 7.77573 | 5.73662 |
tas-ecearth.nc | 8 | 1.43555 | 3.41429 | 2.84304 |
Last commit in #2 suggest significant improvements. Considering that we are not using dask yet, this can be considered as a success.
NVars | NRecords | CDO | CDO (NoLoad) | SMM (Dataset) | SMM (DataArray) | SMM (DataArray+NoLoad) | SMM (Dataset+Write) | SMM (DataArray+Write) | |
---|---|---|---|---|---|---|---|---|---|
onlytos-ipsl.nc | 1 | (12, 332, 362) | 1 | 0.924445 | 0.377513 | 0.347206 | 0.078164 | 0.429603 | 0.431012 |
tas-ecearth.nc | 1 | (12, 256, 512) | 1 | 0.927699 | 0.410739 | 0.38098 | 0.0854462 | 0.468425 | 0.461594 |
2t-era5.nc | 1 | (12, 73, 144) | 1 | 0.924335 | 0.204548 | 0.160671 | 0.0439383 | 0.253002 | 0.248918 |
tos-fesom.nc | 1 | (12, 126859) | 1 | 1.01328 | 0.332096 | 0.32686 | 0.048309 | 0.379621 | 0.372826 |
ua-ecearth.nc | 1 | (2, 19, 256, 512) | 1 | 0.898331 | 0.508237 | 0.475537 | 0.156023 | 0.74989 | 0.735353 |
mix-cesm.nc | 4 | (12, 192, 288) | 1 | 0.886593 | 0.652437 | 0.181319 | 0.0456369 | 0.92309 | 0.248312 |
era5-mon.nc | 1 | (864, 721, 1440) | 1 | 0.990528 | 0.757665 | 0.794592 | 0.33693 | 0.822056 | 0.803797 |
This issue is to keep track of the speed tests that I have been to see what is the optimal configuration for the regridder based on #1
The tests are based on files on different grids (curvilinear, gaussian, gaussian reduced, lonlat and unstructured) to cover all the possibilities, with 2D files, files with mask (i.e. ocean files) and files with pressure levels. We also tested the access of the entire xarray.Dataset versus working on the single xarray.DataArray. The writing of the NetCDF output is also assessed. All tests are run with conservative remapping.
The tests can be found in the playground notebook, and are based on multiple repetition (usually 20 fo each operation). https://github.com/jhardenberg/smmregrid/blob/devel/extend/playground.ipynb