ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
53 stars 114 forks source link

[develop] Update WM and UPP hashes #1083

Closed MichaelLueken closed 2 months ago

MichaelLueken commented 2 months ago

DESCRIPTION OF CHANGES:

This PR brings the weather model hash to 26cb9e6 (May 2) and UPP to 5faac75 (April 9).

Type of change

TESTS CONDUCTED:

Fundamental tests were ran on all machines. Comprehensive tests were ran on Gaea, Hera, Hercules, and Orion.

DEPENDENCIES:

None

DOCUMENTATION:

No documentation updates required

ISSUE:

None

CHECKLIST

RatkoVasic-NOAA commented 2 months ago

With new tags, it compiled and single test on Hera passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024051020363  COMPLETE              20.63
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE              20.63

Approved.

EdwardSnyder-NOAA commented 2 months ago

Ran fundamental tests on AWS and they all passed. Approving.

Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE             150.83
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              17.46
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              68.85
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024051  COMPLETE             286.17
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240513134  COMPLETE             110.64
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024051313465  COMPLETE             109.92
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             743.87
MichaelLueken commented 2 months ago

Hi @BruceKropp-Raytheon -

I just pushed a modification for the grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0 WE2E test configuration that should correct the failure that is happening on Hercules. Specifically, it is possible for the run_fcst job to either run the time steps in a second, or every 5 seconds. If the node allows for 1 second time steps, then the run_fcst job will successfully pass in 1 hour. However, if the node is running 5 second time steps, then the run_fcst job won't complete in 1 hour, leading to a failure due to exceeding walltime. Increasing the walltime from 1 hour to 2 hours allows the run_fcst job to pass regardless of the node that is used on Hercules.

BruceKropp-Raytheon commented 2 months ago

Very nice @MichaelLueken ! I wonder if this is also the case for Orion and Jet, as these occasionally timeout before completing the single test.

MichaelLueken commented 2 months ago

@BruceKropp-Raytheon -

I believe that it is very similar to the occasional issues on Orion.

Looking at the failure on Jet from last night's test (before maintenance began), the run_fcst job successfully ran to completion. There was a failure in verification. The node that the failed job landed on was bad and the job hung until the walltime was hit, at which time it failed.

MichaelLueken commented 2 months ago

All Jenkins tests have successfully passed. Moving forward with merging this work now.