Closed tomvothecoder closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
584fcce
) to head (6459c1b
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @chengzhuzhang, this PR is ready for review.
After refactoring, I managed to cut down the runtime as following:
I also performed a regression test using the same e3sm_diags dataset between main
and this branch and produced identical results. The GH Actions build also passes.
# %%
import xarray as xr
import xcdat as xc
### 1. Using temporal.climatology from xcdat
file_path = "/global/cfs/cdirs/e3sm/e3sm_diags/postprocessed_e3sm_v2_data_for_e3sm_diags/20221103.v2.LR.amip.NGD_v3atm.chrysalis/arm-diags-data/PRECT_sgpc1_198501_201412.nc"
ds = xc.open_dataset(file_path)
branch = "dev"
# %%
# 1. Calculate annual climatology
# -------------------------------
ds_annual_cycle = ds.temporal.climatology("PRECT", "month", keep_weights=True)
ds_annual_cycle.to_netcdf(f"temporal_climatology_{branch}.nc")
"""
main
--------------------------
CPU times: user 33 s, sys: 2.41 s, total: 35.4 s
Wall time: 35.4 s
refactor/688-temp-api-perf
--------------------------
CPU times: user 5.85 s, sys: 2.88 s, total: 8.72 s
Wall time: 8.78 s
"""
# %%
# 2. Calculate annual departures
# ------------------------------
ds_annual_cycle_anom = ds.temporal.departures("PRECT", "month", keep_weights=True)
ds_annual_cycle_anom.to_netcdf(f"temporal_departures_{branch}.nc")
"""
main
--------------------------
CPU times: user 1min 9s, sys: 4.8 s, total: 1min 14s
Wall time: 1min 14s
refactor/688-temp-api-perf
--------------------------
CPU times: user 11.6 s, sys: 4.32 s, total: 15.9 s
Wall time: 15.9 s
"""
# %%
# 3. Calculate monthly group averages
# -----------------------------------
ds_annual_avg = ds.temporal.group_average("PRECT", "month", keep_weights=True)
ds_annual_avg.to_netcdf(f"temporal_group_average_{branch}.nc")
"""
main
--------------------------
CPU times: user 33.5 s, sys: 2.27 s, total: 35.8 s
Wall time: 35.9 s
refactor/688-temp-api-perf
--------------------------
CPU times: user 5.59 s, sys: 2.06 s, total: 7.65 s
Wall time: 7.65 s
"""
import glob
import xarray as xr
# Get the filepaths for the dev and main branches
dev_filepaths = sorted(glob.glob("qa/issue-688/dev/*.nc"))
main_filepaths = sorted(glob.glob("qa/issue-688/main/*.nc"))
for fp, mp in zip(dev_filepaths, main_filepaths):
print(f"Comparing {fp} and {mp}")
# Load the datasets
dev_ds = xr.open_dataset(fp)
main_ds = xr.open_dataset(mp)
# Compare the datasets
try:
xr.testing.assert_identical(dev_ds, main_ds)
except AssertionError as e:
print(f"Datasets are not identical: {e}")
else:
print("Datasets are identical")
Description
TODO:
_get_weights()
, loading time lengths into memory is slow (lines) -- replace with casting to"timedelta64[ns]"
thenfloat64
_get_weights()
, performing validation to check the sums of weights for each group adds up to 1 is slow (lines) -- remove this unnecessary assertionIdentify performance optimizations -- I don't think this is necessary right nowgroupby
with vs. withoutflox
packagemain
Checklist
If applicable: