ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
53 stars 114 forks source link

[SRW-AQM] Port SRW-AQM to Derecho #1090

Closed chan-hoo closed 1 month ago

chan-hoo commented 1 month ago

DESCRIPTION OF CHANGES:

Type of change

TESTS CONDUCTED:

ISSUE:

Fixes Issue #1038

CHECKLIST

MichaelLueken commented 1 month ago

The config.aqm.yaml sample warm start configuration was also successfully tested on Derecho:

       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202311100000               make_grid                     4665210           SUCCEEDED                   0         1          31.0
202311100000               make_orog                     4665226           SUCCEEDED                   0         1         190.0
202311100000          make_sfc_climo                     4665242           SUCCEEDED                   0         1          67.0
202311100000           nexus_gfs_sfc                     4665211           SUCCEEDED                   0         1          14.0
202311100000       nexus_emission_00                     4665228           SUCCEEDED                   0         1         658.0
202311100000       nexus_emission_01                     4665227           SUCCEEDED                   0         1         664.0
202311100000       nexus_emission_02                     4665229           SUCCEEDED                   0         1         777.0
202311100000        nexus_post_split                     4665349           SUCCEEDED                   0         1          99.0
202311100000           fire_emission                     4665212           SUCCEEDED                   0         1          16.0
202311100000            point_source                     4665230           SUCCEEDED                   0         1         197.0
202311100000             aqm_ics_ext                     4665282           SUCCEEDED                   0         1         135.0
202311100000                aqm_lbcs                     4665351           SUCCEEDED                   0         1          52.0
202311100000           get_extrn_ics                     4665215           SUCCEEDED                   0         1          18.0
202311100000          get_extrn_lbcs                     4665214           SUCCEEDED                   0         1          17.0
202311100000         make_ics_mem000                     4665258           SUCCEEDED                   0         1         108.0
202311100000        make_lbcs_mem000                     4665259           SUCCEEDED                   0         1         245.0
202311100000         run_fcst_mem000                     4665371           SUCCEEDED                   0         1        2447.0
202311100000    run_post_mem000_f000                     4665468           SUCCEEDED                   0         1          29.0
202311100000    run_post_mem000_f001                     4665469           SUCCEEDED                   0         1          27.0
202311100000    run_post_mem000_f002                     4665500           SUCCEEDED                   0         1          30.0
202311100000    run_post_mem000_f003                     4665501           SUCCEEDED                   0         1          30.0
202311100000    run_post_mem000_f004                     4665547           SUCCEEDED                   0         1          35.0
202311100000    run_post_mem000_f005                     4665550           SUCCEEDED                   0         1          34.0
202311100000    run_post_mem000_f006                     4665591           SUCCEEDED                   0         1          41.0
202311100000    run_post_mem000_f007                     4665592           SUCCEEDED                   0         1          34.0
202311100000    run_post_mem000_f008                     4665621           SUCCEEDED                   0         1          29.0
202311100000    run_post_mem000_f009                     4665622           SUCCEEDED                   0         1          35.0
202311100000    run_post_mem000_f010                     4665630           SUCCEEDED                   0         1          29.0
202311100000    run_post_mem000_f011                     4665631           SUCCEEDED                   0         1          31.0
202311100000    run_post_mem000_f012                     4665639           SUCCEEDED                   0         1          33.0
202311100000    run_post_mem000_f013                     4665640           SUCCEEDED                   0         1          33.0
202311100000    run_post_mem000_f014                     4665670           SUCCEEDED                   0         1          34.0
202311100000    run_post_mem000_f015                     4665671           SUCCEEDED                   0         1          33.0
202311100000    run_post_mem000_f016                     4665676           SUCCEEDED                   0         1          33.0
202311100000    run_post_mem000_f017                     4665677           SUCCEEDED                   0         1          33.0
202311100000    run_post_mem000_f018                     4665684           SUCCEEDED                   0         1          31.0
202311100000    run_post_mem000_f019                     4665685           SUCCEEDED                   0         1          33.0
202311100000    run_post_mem000_f020                     4665694           SUCCEEDED                   0         1          32.0
202311100000    run_post_mem000_f021                     4665695           SUCCEEDED                   0         1          32.0
202311100000    run_post_mem000_f022                     4665701           SUCCEEDED                   0         1          23.0
202311100000    run_post_mem000_f023                     4665702           SUCCEEDED                   0         1          23.0
202311100000    run_post_mem000_f024                     4665703           SUCCEEDED                   0         1          23.0
================================================================================================================================
202311110000           nexus_gfs_sfc                     4665213           SUCCEEDED                   0         1          17.0
202311110000       nexus_emission_00                     4665232           SUCCEEDED                   0         1         657.0
202311110000       nexus_emission_01                     4665234           SUCCEEDED                   0         1         667.0
202311110000       nexus_emission_02                     4665233           SUCCEEDED                   0         1         797.0
202311110000        nexus_post_split                     4665352           SUCCEEDED                   0         1          99.0
202311110000           fire_emission                     4665216           SUCCEEDED                   0         1          19.0
202311110000            point_source                     4665231           SUCCEEDED                   0         1         197.0
202311110000                 aqm_ics                     4665704           SUCCEEDED                   0         1         130.0
202311110000                aqm_lbcs                     4665350           SUCCEEDED                   0         1          51.0
202311110000           get_extrn_ics                     4665217           SUCCEEDED                   0         1          13.0
202311110000          get_extrn_lbcs                     4665218           SUCCEEDED                   0         1          13.0
202311110000         make_ics_mem000                     4665260           SUCCEEDED                   0         1         105.0
202311110000        make_lbcs_mem000                     4665261           SUCCEEDED                   0         1         232.0
202311110000         run_fcst_mem000                     4665708           SUCCEEDED                   0         1        2456.0
202311110000    run_post_mem000_f000                     4665727           SUCCEEDED                   0         1          34.0
202311110000    run_post_mem000_f001                     4665728           SUCCEEDED                   0         1          33.0
202311110000    run_post_mem000_f002                     4665746           SUCCEEDED                   0         1          34.0
202311110000    run_post_mem000_f003                     4665747           SUCCEEDED                   0         1          35.0
202311110000    run_post_mem000_f004                     4665781           SUCCEEDED                   0         1          33.0
202311110000    run_post_mem000_f005                     4665782           SUCCEEDED                   0         1          35.0
202311110000    run_post_mem000_f006                     4665791           SUCCEEDED                   0         1          36.0
202311110000    run_post_mem000_f007                     4665792           SUCCEEDED                   0         1          30.0
202311110000    run_post_mem000_f008                     4665805           SUCCEEDED                   0         1          33.0
202311110000    run_post_mem000_f009                     4665804           SUCCEEDED                   0         1          31.0
202311110000    run_post_mem000_f010                     4665810           SUCCEEDED                   0         1          30.0
202311110000    run_post_mem000_f011                     4665811           SUCCEEDED                   0         1          32.0
202311110000    run_post_mem000_f012                     4665834           SUCCEEDED                   0         1          33.0
202311110000    run_post_mem000_f013                     4665835           SUCCEEDED                   0         1          33.0
202311110000    run_post_mem000_f014                     4665848           SUCCEEDED                   0         1          33.0
202311110000    run_post_mem000_f015                     4665849           SUCCEEDED                   0         1          32.0
202311110000    run_post_mem000_f016                     4665853           SUCCEEDED                   0         1          34.0
202311110000    run_post_mem000_f017                     4665854           SUCCEEDED                   0         1          34.0
202311110000    run_post_mem000_f018                     4665863           SUCCEEDED                   0         1          33.0
202311110000    run_post_mem000_f019                     4665864           SUCCEEDED                   0         1          28.0
202311110000    run_post_mem000_f020                     4665880           SUCCEEDED                   0         1          40.0
202311110000    run_post_mem000_f021                     4665886           SUCCEEDED                   0         1          30.0
202311110000    run_post_mem000_f022                     4665887           SUCCEEDED                   0         1          29.0
202311110000    run_post_mem000_f023                     4665889           SUCCEEDED                   0         1          33.0
202311110000    run_post_mem000_f024                     4665888           SUCCEEDED                   0         1          32.0
chan-hoo commented 1 month ago

@RatkoVasic-NOAA, please review this PR when you are available :)

RatkoVasic-NOAA commented 1 month ago

Test passed for me on Derecho:

rvasic@derecho4:/glade/work/rvasic/1090/expt_dirs> cat WE2E_summary_20240604135733.txt
----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
aqm_grid_AQM_NA13km_suite_GFS_v16_20240604121623                   COMPLETE            5096.66
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            5096.66

Detailed summary of each experiment:

----------------------------------------------------------------------------------------------------
Detailed summary of experiment aqm_grid_AQM_NA13km_suite_GFS_v16_20240604121623
in directory /glade/work/rvasic/1090/expt_dirs/aqm_grid_AQM_NA13km_suite_GFS_v16
                                        | Status    | Walltime   | Core hours used
----------------------------------------------------------------------------------------------------
make_grid_202311100000                    SUCCEEDED          49.0           0.33
nexus_gfs_sfc_202311100000                SUCCEEDED          33.0           0.01
fire_emission_202311100000                SUCCEEDED          34.0           0.01
get_extrn_ics_202311100000                SUCCEEDED          30.0           0.01
get_extrn_lbcs_202311100000               SUCCEEDED          31.0           0.01
nexus_gfs_sfc_202311110000                SUCCEEDED          31.0           0.01
fire_emission_202311110000                SUCCEEDED          34.0           0.01
get_extrn_ics_202311110000                SUCCEEDED          30.0           0.01
get_extrn_lbcs_202311110000               SUCCEEDED          30.0           0.01
nexus_emission_00_202311100000            SUCCEEDED         724.0          51.48
nexus_emission_01_202311100000            SUCCEEDED         709.0          50.42
nexus_emission_02_202311100000            SUCCEEDED         832.0          59.16
nexus_emission_00_202311110000            SUCCEEDED         729.0          51.84
nexus_emission_01_202311110000            SUCCEEDED         715.0          50.84
nexus_emission_02_202311110000            SUCCEEDED         869.0          61.80
make_orog_202311100000                    SUCCEEDED         190.0           1.27
point_source_202311100000                 SUCCEEDED         197.0           0.05
point_source_202311110000                 SUCCEEDED         196.0           0.05
make_sfc_climo_202311100000               SUCCEEDED          95.0           1.27
make_ics_mem000_202311100000              SUCCEEDED         119.0           1.59
make_lbcs_mem000_202311100000             SUCCEEDED         269.0           3.59
make_ics_mem000_202311110000              SUCCEEDED         117.0           1.56
make_lbcs_mem000_202311110000             SUCCEEDED         279.0           3.72
aqm_lbcs_202311100000                     SUCCEEDED          58.0           0.39
aqm_lbcs_202311110000                     SUCCEEDED          64.0           0.43
nexus_post_split_202311100000             SUCCEEDED         121.0           0.03
nexus_post_split_202311110000             SUCCEEDED         117.0           0.03
run_fcst_mem000_202311100000              SUCCEEDED        2329.0        2318.65
run_post_mem000_f000_202311100000         SUCCEEDED          53.0           0.71
run_post_mem000_f001_202311100000         SUCCEEDED          48.0           0.64
run_post_mem000_f002_202311100000         SUCCEEDED          45.0           0.60
run_post_mem000_f003_202311100000         SUCCEEDED          29.0           0.39
run_post_mem000_f004_202311100000         SUCCEEDED          48.0           0.64
run_post_mem000_f005_202311100000         SUCCEEDED          47.0           0.63
run_post_mem000_f006_202311100000         SUCCEEDED          57.0           0.76
run_post_mem000_f007_202311100000         SUCCEEDED          56.0           0.75
run_post_mem000_f008_202311100000         SUCCEEDED          60.0           0.80
run_post_mem000_f009_202311100000         SUCCEEDED          49.0           0.65
run_post_mem000_f010_202311100000         SUCCEEDED          27.0           0.36
run_post_mem000_f011_202311100000         SUCCEEDED          48.0           0.64
run_post_mem000_f012_202311100000         SUCCEEDED          53.0           0.71
run_post_mem000_f013_202311100000         SUCCEEDED          50.0           0.67
run_post_mem000_f014_202311100000         SUCCEEDED          49.0           0.65
run_post_mem000_f015_202311100000         SUCCEEDED          42.0           0.56
run_post_mem000_f016_202311100000         SUCCEEDED          27.0           0.36
run_post_mem000_f017_202311100000         SUCCEEDED          52.0           0.69
run_post_mem000_f018_202311100000         SUCCEEDED          30.0           0.40
run_post_mem000_f019_202311100000         SUCCEEDED          24.0           0.32
run_post_mem000_f020_202311100000         SUCCEEDED          42.0           0.56
run_post_mem000_f021_202311100000         SUCCEEDED          37.0           0.49
aqm_ics_202311110000                      SUCCEEDED          94.0           0.03
run_post_mem000_f022_202311100000         SUCCEEDED          36.0           0.48
run_post_mem000_f023_202311100000         SUCCEEDED          36.0           0.48
run_post_mem000_f024_202311100000         SUCCEEDED          35.0           0.47
integration_test_mem000_202311100000      SUCCEEDED          22.0           0.15
run_fcst_mem000_202311110000              SUCCEEDED        2422.0        2411.24
run_post_mem000_f000_202311110000         SUCCEEDED          41.0           0.55
run_post_mem000_f001_202311110000         SUCCEEDED          27.0           0.36
run_post_mem000_f002_202311110000         SUCCEEDED          39.0           0.52
run_post_mem000_f003_202311110000         SUCCEEDED          35.0           0.47
run_post_mem000_f004_202311110000         SUCCEEDED          37.0           0.49
run_post_mem000_f005_202311110000         SUCCEEDED          25.0           0.33
run_post_mem000_f006_202311110000         SUCCEEDED          36.0           0.48
run_post_mem000_f007_202311110000         SUCCEEDED          35.0           0.47
run_post_mem000_f008_202311110000         SUCCEEDED          23.0           0.31
run_post_mem000_f009_202311110000         SUCCEEDED          35.0           0.47
run_post_mem000_f010_202311110000         SUCCEEDED          27.0           0.36
run_post_mem000_f011_202311110000         SUCCEEDED          39.0           0.52
run_post_mem000_f012_202311110000         SUCCEEDED          34.0           0.45
run_post_mem000_f013_202311110000         SUCCEEDED          37.0           0.49
run_post_mem000_f014_202311110000         SUCCEEDED          39.0           0.52
run_post_mem000_f015_202311110000         SUCCEEDED          37.0           0.49
run_post_mem000_f016_202311110000         SUCCEEDED          43.0           0.57
run_post_mem000_f017_202311110000         SUCCEEDED          49.0           0.65
run_post_mem000_f018_202311110000         SUCCEEDED          43.0           0.57
run_post_mem000_f019_202311110000         SUCCEEDED          44.0           0.59
run_post_mem000_f020_202311110000         SUCCEEDED          25.0           0.33
run_post_mem000_f021_202311110000         SUCCEEDED          39.0           0.52
run_post_mem000_f022_202311110000         SUCCEEDED          40.0           0.53
run_post_mem000_f023_202311110000         SUCCEEDED          40.0           0.53
run_post_mem000_f024_202311110000         SUCCEEDED          39.0           0.52
integration_test_mem000_202311110000      SUCCEEDED          24.0           0.16
----------------------------------------------------------------------------------------------------
Total                                     COMPLETE                       5096.66

Approving.

chan-hoo commented 1 month ago

Thank you, @RatkoVasic-NOAA !!! :) :)

MichaelLueken commented 1 month ago

The Jenkins tests successfully passed on Derecho, Gaea, Hera Intel, Hercules, and Orion.

As expected, the test phase timed out on Jet (ran longer than 8 hours). The Jenkins tests were manually ran on Jet and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
community_20240605202421                                           COMPLETE              17.63
custom_ESGgrid_20240605202422                                      COMPLETE              27.66
custom_ESGgrid_Great_Lakes_snow_8km_20240605202423                 COMPLETE              21.03
custom_GFDLgrid_20240605202425                                     COMPLETE              12.20
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202406  COMPLETE               9.73
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20  COMPLETE              83.45
get_from_HPSS_ics_RAP_lbcs_RAP_20240605202428                      COMPLETE              16.41
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240605202429  COMPLETE             614.90
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              65.29
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.91
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024  COMPLETE             913.06
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1790.27

The get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS WE2E test failed twice due to issues pulling the necessary ICs from NOMADS. The use of rocotorewind/rocotoboot this morning has allowed these get_extrn_ics/lbcs tasks to successfully complete. Once the test passes, I will move forward with merging this work.

MichaelLueken commented 1 month ago

As expected, all Jenkins WE2E coverage tests successfully passed on Hera GNU:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Central_Asia_3km_20240606155557                     COMPLETE              63.40
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200_202406  COMPLETE              11.40
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_20240606155559              COMPLETE              18.53
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024060615  COMPLETE              71.18
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202  COMPLETE              26.71
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240606155  COMPLETE              32.13
long_fcst_20240606155602                                           COMPLETE             109.41
MET_verification_only_vx_20240606155603                            COMPLETE               0.29
MET_ensemble_verification_only_vx_time_lag_20240606155605          COMPLETE               9.97
2019_halloween_storm_20240606155608                                COMPLETE              82.13
2020_jan_cold_blast_20240606155609                                 COMPLETE              80.22
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             505.37

Merging now.