ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
56 stars 119 forks source link

[release/public-v2.2.0] Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms #939

Closed mkavulich closed 1 year ago

mkavulich commented 1 year ago

Note: this is identical to #934 except it contains an additional fix for removing old crontab entries on Cheyenne and Derecho

DESCRIPTION OF CHANGES:

The option to create an experiment with the option USE_CRON_TO_RELAUNCH=True is currently broken on Cheyenne and Derecho due to some bad python logic. This PR fixes that issue.

I also took the opportunity to update the PR template to include the new supported platforms (Derecho, Hercules, and Gaea C5)

Type of change

TESTS CONDUCTED:

Ran WE2E fundamental tests with the option --launch=cron on three platforms. Previously failing on Cheyenne an Derecho, these tasks all succeed except for the grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 test on Cheyenne: this is a pre-existing failure (see Issue #933)

DEPENDENCIES:

None

DOCUMENTATION:

None

ISSUE:

Fixes #932

CHECKLIST

MichaelLueken commented 1 year ago

@mkavulich - Will you be including the additional fix for removing old crontab entries on Cheyenne and Derecho in a subsequent PR to develop? I ask because when I ran the grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 test on Derecho for PR #934, the job was successfully removed from the crontab once it completed. Was this an issue only for Cheyenne? Thanks.

MichaelLueken commented 1 year ago

Given that the Jenkins tests successfully passed for PR #934 and the only additional modifications were made for Cheyenne and Derecho, which aren't supported via Jenkins, I have completed running the WE2E coverage tests on Derecho and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km                                     COMPLETE              21.88
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     COMPLETE              35.51
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16                COMPLETE              42.35
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR           COMPLETE              26.94
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta    COMPLETE              16.71
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR                COMPLETE              38.67
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_  COMPLETE              22.65
pregen_grid_orog_sfc_climo                                         COMPLETE              13.40
specify_template_filenames                                         COMPLETE              13.75
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             231.86

I will now move forward with merging this work.

mkavulich commented 1 year ago

@MichaelLueken turns out there was a bad assumption in the crontab script: Derecho does not suffer from the same problem as Cheyenne where a different crontab command is needed when run from the cron job. That is why this error was giving me confusing results, and working on Derecho but not Cheyenne. This PR is probably fine for the release branch (the crontab command on Derecho is actually the same as the full /usr/bin/crontab specified by the special logic), but I will include the correct fix in the develop branch eventually.