[Refactor] Improve the performance of temporal group averaging

tomvothecoder commented 2 months ago

Description

Closes #688

TODO:

[x] Identify performance bottlenecks
1. Generating labeled time coordinates (aka assign groups) and adding it to the existing time dimension with existing coordinates, then performing the Xarray groupby yields extremely slow results (not sure why, it's an Xarray issue). (Refer to comment) -- replace time coords with labeled time coords directly for grouping, rather than adding labeled time coords as auxiliary coords on the time dimension (which slows things down in Xarray for some reason, need to ask Xarray forum)
2. In _get_weights(), loading time lengths into memory is slow (lines) -- replace with casting to "timedelta64[ns]" then float64
3. In _get_weights(), performing validation to check the sums of weights for each group adds up to 1 is slow (lines) -- remove this unnecessary assertion
[ ] ~~Identify performance optimizations -- I don't think this is necessary right now~~
1. Xarray groupby with vs. without flox package
2. Try with Dask chunking
[x] Make sure unit tests still pass
[x] Measure performance difference between this branch and main
[x] Perform regression testing between branch code on same dataset

Checklist

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] My changes generate no new warnings
[ ] Any dependent changes have been merged and published in downstream modules

If applicable:

[ ] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass with my changes (locally and CI/CD build)
[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[ ] I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (584fcce) to head (6459c1b). Report is 1 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #689 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 15 15 Lines 1544 1546 +2 ========================================= + Hits 1544 1546 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tomvothecoder commented 2 months ago

Hi @chengzhuzhang, this PR is ready for review.

After refactoring, I managed to cut down the runtime as following:

Annual climatology: 33s -> 5.85s
Annual departures: 1min9s -> 11.6s
monthly group averages: 33.5s -> 5.59s.

I also performed a regression test using the same e3sm_diags dataset between main and this branch and produced identical results. The GH Actions build also passes.

Benchmarking Script

# %%
import xarray as xr
import xcdat as xc

### 1. Using temporal.climatology from xcdat
file_path = "/global/cfs/cdirs/e3sm/e3sm_diags/postprocessed_e3sm_v2_data_for_e3sm_diags/20221103.v2.LR.amip.NGD_v3atm.chrysalis/arm-diags-data/PRECT_sgpc1_198501_201412.nc"
ds = xc.open_dataset(file_path)

branch = "dev"
# %%
# 1. Calculate annual climatology
# -------------------------------
ds_annual_cycle = ds.temporal.climatology("PRECT", "month", keep_weights=True)
ds_annual_cycle.to_netcdf(f"temporal_climatology_{branch}.nc")
"""
main
--------------------------
CPU times: user 33 s, sys: 2.41 s, total: 35.4 s
Wall time: 35.4 s

refactor/688-temp-api-perf
--------------------------
CPU times: user 5.85 s, sys: 2.88 s, total: 8.72 s
Wall time: 8.78 s
"""

# %%
# 2. Calculate annual departures
# ------------------------------
ds_annual_cycle_anom = ds.temporal.departures("PRECT", "month", keep_weights=True)
ds_annual_cycle_anom.to_netcdf(f"temporal_departures_{branch}.nc")
"""
main
--------------------------
CPU times: user 1min 9s, sys: 4.8 s, total: 1min 14s
Wall time: 1min 14s

refactor/688-temp-api-perf
--------------------------
CPU times: user 11.6 s, sys: 4.32 s, total: 15.9 s
Wall time: 15.9 s
"""

# %%
# 3. Calculate monthly group averages
# -----------------------------------
ds_annual_avg = ds.temporal.group_average("PRECT", "month", keep_weights=True)
ds_annual_avg.to_netcdf(f"temporal_group_average_{branch}.nc")

"""
main
--------------------------
CPU times: user 33.5 s, sys: 2.27 s, total: 35.8 s
Wall time: 35.9 s

refactor/688-temp-api-perf
--------------------------
CPU times: user 5.59 s, sys: 2.06 s, total: 7.65 s
Wall time: 7.65 s
"""

Regression testing script

import glob

import xarray as xr

# Get the filepaths for the dev and main branches
dev_filepaths = sorted(glob.glob("qa/issue-688/dev/*.nc"))
main_filepaths = sorted(glob.glob("qa/issue-688/main/*.nc"))

for fp, mp in zip(dev_filepaths, main_filepaths):
    print(f"Comparing {fp} and {mp}")
    # Load the datasets
    dev_ds = xr.open_dataset(fp)
    main_ds = xr.open_dataset(mp)

    # Compare the datasets
    try:
        xr.testing.assert_identical(dev_ds, main_ds)
    except AssertionError as e:
        print(f"Datasets are not identical: {e}")
    else:
        print("Datasets are identical")

Next step

I will investigate the differences you pointed out here between xCDAT and the e3sm_diags climatology functions separately from this PR (related e3sm_diags discussion post)
Open a GH issue on the Xarray repo about grouping with auxiliary time coordinates resulting in a large performance hit

xCDAT / xcdat