Closed MichaelLueken closed 8 months ago
@gspetro-NOAA - Thank you very much for reviewing these changes! I have committed the suggestions that you have made to the documentation, replaced the lowercase d with uppercase D for ITASKS in config_defaults.yaml
for consistency, and added logging prints to run_WE2E_tests.py
to let users know that the values are being reset for GNU compilers. I'm hoping I'll be able to satisfactorily go through this morning's code review! Please let me know if you see any other changes that should be made or if you have any questions.
It looks like we need to add the line:
DOMAIN_PREGEN_BASEDIR: /contrib/EPIC/UFS_SRW_data/develop/FV3LAM_pregen
to the noaacloud machine file for the fundamental tests to run on the cloud. @EdwardSnyder-NOAA said that we did this for the release in PR #937 and should do that in develop
, too. Without it, experiment generation failed. After I added that line, fundamental tests generally pass.
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
fails, but it is failing due to hitting the walltime for the post-processing tasks. It passes on GCP.grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0
fails due to hitting the walltime for the run_fcst task. It passes on AWS.
I can approve once the machine file change is made. Thank you very much for running the fundamental tests on NOAA Cloud platforms, @gspetro-NOAA! I have added the DOMAIN_PREGEN_BASEDIR
entry into ush/machine/noaacloud.yaml
at
ec44d97.
On Jet, the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
WE2E coverage test had failed. Rocotorewind/rocotoboot allowed the test to successfully pass:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240216161033 COMPLETE 20.37
custom_ESGgrid_20240216161035 COMPLETE 17.82
custom_ESGgrid_Great_Lakes_snow_8km_20240216161036 COMPLETE 58.08
custom_GFDLgrid_20240216161038 COMPLETE 9.41
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202402 COMPLETE 10.73
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20 COMPLETE 55.55
get_from_HPSS_ics_RAP_lbcs_RAP_20240216161043 COMPLETE 17.62
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240216161044 COMPLETE 338.41
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 45.93
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 8.29
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024 COMPLETE 523.51
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR_2024 COMPLETE 10.59
----------------------------------------------------------------------------------------------------
Total COMPLETE 1116.31
This also shows that the code in this branch is ready to run on Rocky8, since the xjet partition is required to run the forecast job. One caveat is that the Jenkins pipeline for Jet will need to be set to one of the login nodes that has been transitioned to Rocky8, or else the run_fcst
job will never run (you need to be logged into a Rocky8 login node in order to use the xjet partition).
Once a successful run of the nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16
WE2E test completes on Derecho, this work can get merged. Currently encountering issues with the get_extrn_ics
and get_extrn_lbcs
tasks failing to retrieve the necessary data from AWS in 3 hours (this has only been encountered beginning after the Presidents' Day weekend). If this task continues to fail, then the memory for these tasks will need to be raised from 2G to a higher value.
Manually submitting the Jenkins coverage WE2E tests on Derecho successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km_20240221070627 COMPLETE 30.98
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 49.38
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024022107063 COMPLETE 54.23
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR_20240221 COMPLETE 36.72
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 22.19
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024022107064 COMPLETE 48.32
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 30.16
pregen_grid_orog_sfc_climo_20240221070649 COMPLETE 23.06
specify_template_filenames_20240221070651 COMPLETE 23.59
----------------------------------------------------------------------------------------------------
Total COMPLETE 318.63
Moving forward with merging this work now.
DESCRIPTION OF CHANGES:
With recent changes made to the UFS-WM (PR #1823), it is now possible to see in the log files that UPP 2d decomposition is being performed for inline post. To this end, the UFS-WM hash has been updated to 020e783 (October 27, 2023), the UPP hash was updated to fae617b (October 6, 2023), and the UFS_UTILS hash was updated to dc0e4a6 (November 6, 2023).
With the updated UPP hash, the old
postxconfig-NT-fv3lam.txt
file had been removed. Wen suggested using thepostxconfig-NT-fv3lam_rrfs.txt
file instead. Unfortunately, due to issues with the CRTM on Derecho, we are unable to move forward with this postxconfig file (this file contains assimilated radiances). The ufs-weather-model/tests/parm directory contains a copy of the oldpostxconfig-NT-fv3lam.txt
file. To expedite moving forward with this update, I'm currently pointing to this copy of the postxconfig file. Once the issue with the CRTM on Derecho has been corrected, the SRW App repository can transition to thepostxconfig-NT-fv3lam_rrfs.txt
file.Changes to enable 2d decomposition include:
itasks
to themodel_configure
file (values greater than 1 enable 2d decomposition in inline post).numx
to the end of the&NAMPGB
namelist options (values of numx greater than 1 enable 2d decomposition in offline post).itasks
to the list of variables to be added to themodel_configure
file.ITASKS: 2
to enable inline post 2d decomposition.NUMX: 2
to enable offline post 2d decomposition.The ability to run comprehensive tests has been added back into the Jenkinsfile, as well as the ability to run the automated Jenkins tests on Derecho.
Type of change
TESTS CONDUCTED:
run_post
andrun_fcst
tasks fail on Hera GNU with double free or corruption errors). Disabling 2D decomposition for GNU compilers will allow the updated 2D decomposition WE2E tests to run on Hera GNU.DEPENDENCIES:
None
DOCUMENTATION:
Documentation has been updated in the WE2E test configuration files, the
ConfigWorkflow.rst
chapter, andconfig_defaults.yaml
.ISSUE:
Resolves #480
CHECKLIST