Closed MichaelLueken closed 2 months ago
With new tags, it compiled and single test on Hera passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024051020363 COMPLETE 20.63
----------------------------------------------------------------------------------------------------
Total COMPLETE 20.63
Approved.
Ran fundamental tests on AWS and they all passed. Approving.
Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 150.83
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 17.46
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 68.85
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024051 COMPLETE 286.17
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240513134 COMPLETE 110.64
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024051313465 COMPLETE 109.92
----------------------------------------------------------------------------------------------------
Total COMPLETE 743.87
Hi @BruceKropp-Raytheon -
I just pushed a modification for the grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0
WE2E test configuration that should correct the failure that is happening on Hercules. Specifically, it is possible for the run_fcst
job to either run the time steps in a second, or every 5 seconds. If the node allows for 1 second time steps, then the run_fcst
job will successfully pass in 1 hour. However, if the node is running 5 second time steps, then the run_fcst
job won't complete in 1 hour, leading to a failure due to exceeding walltime. Increasing the walltime from 1 hour to 2 hours allows the run_fcst
job to pass regardless of the node that is used on Hercules.
Very nice @MichaelLueken ! I wonder if this is also the case for Orion and Jet, as these occasionally timeout before completing the single test.
@BruceKropp-Raytheon -
I believe that it is very similar to the occasional issues on Orion.
Looking at the failure on Jet from last night's test (before maintenance began), the run_fcst
job successfully ran to completion. There was a failure in verification. The node that the failed job landed on was bad and the job hung until the walltime was hit, at which time it failed.
All Jenkins tests have successfully passed. Moving forward with merging this work now.
DESCRIPTION OF CHANGES:
This PR brings the weather model hash to 26cb9e6 (May 2) and UPP to 5faac75 (April 9).
Type of change
TESTS CONDUCTED:
Fundamental tests were ran on all machines. Comprehensive tests were ran on Gaea, Hera, Hercules, and Orion.
DEPENDENCIES:
None
DOCUMENTATION:
No documentation updates required
ISSUE:
None
CHECKLIST