Closed RatkoVasic-NOAA closed 5 months ago
The fundamental tests were also successfully run on Jet using CentOS:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 9.10
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 15.51
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 8.28
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 16.07
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024022 COMPLETE 27.90
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240229203 COMPLETE 21.75
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024022920365 COMPLETE 21.27
----------------------------------------------------------------------------------------------------
Total COMPLETE 119.88
Built the SRW App on Rocky 8 using the changes from this PR and ensured the changes worked by running this case: /lfs4/HFIP/hfv3gfs/Edward.Snyder/PR_1045/expt_dirs/grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Fundamental tests ran successfully on Jet (xjet):
All 7 experiments finished
Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 9.90
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 13.67
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 7.12
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 16.18
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024030 COMPLETE 30.38
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240301215 COMPLETE 22.14
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024030121531 COMPLETE 22.77
----------------------------------------------------------------------------------------------------
Total COMPLETE 122.16
Detailed summary written to /mnt/lfs4/HFIP/hfv3gfs/Natalie.Perlin/SRW/expt_dirs/WE2E_summary_20240301223112.txt
The Hera Jenkins tests failed due to the system coming down yesterday for maintenance. These tests have been requeued.
There was also a failure on Jet. The get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h
test failed in make_lbcs
with an OOM error. Using rocotorewind/rocotoboot allowed this test to pass:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240304152101 COMPLETE 21.59
custom_ESGgrid_20240304152102 COMPLETE 18.35
custom_ESGgrid_Great_Lakes_snow_8km_20240304152104 COMPLETE 13.40
custom_GFDLgrid_20240304152106 COMPLETE 9.45
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202403 COMPLETE 10.26
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20 COMPLETE 49.66
get_from_HPSS_ics_RAP_lbcs_RAP_20240304152110 COMPLETE 15.30
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240304152111 COMPLETE 222.35
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 43.97
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 9.64
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024 COMPLETE 533.34
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR_2024 COMPLETE 10.62
----------------------------------------------------------------------------------------------------
Total COMPLETE 957.93
Once the Hera tests complete, this PR can be merged.
The Hera Intel tests were run on Rocky8 and all tests passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km_20240308143348 COMPLETE 18.07
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200_2024030 COMPLETE 6.05
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2 COMPLETE 766.89
get_from_HPSS_ics_HRRR_lbcs_RAP_20240308143351 COMPLETE 14.39
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 5.96
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 12.73
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_20240308143354 COMPLETE 10.19
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2_20240 COMPLETE 6.22
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_202403 COMPLETE 235.54
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_20240308 COMPLETE 313.52
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR_202403081 COMPLETE 328.98
pregen_grid_orog_sfc_climo_20240308143359 COMPLETE 7.09
----------------------------------------------------------------------------------------------------
Total COMPLETE 1725.63
@RatkoVasic-NOAA -
Unfortunately, while running the WE2E tests with Rocky8 on Hera GNU, the issue that you noted during the UFS apps and components coordination meeting showed up - all tests are failing due to using srun and not being able to find libpmi.so.0
and libpmi2.so.0
.
We will need to hope that the tests are able to run over the weekend on CentOS and no longer set in queue.
@RatkoVasic-NOAA -
Given that Hera GNU tests are just sitting in queue for days and the inability to run Hera GNU on Rocky8, the successful run of the Hera Intel and the rest of the platforms will be enough to get this work merged.
Since Rocky8 will be the default package of the nodes following today's update, I will go ahead and set the spack-stack path to point at the rocky8 location and change the ush/machine/jet.yaml
file to use xJet
for the forecast tasks. Once Jet is returned, Kris Booker and I will check to ensure that the Jet runner is using one of the Rocky8 front ends, then I will run the Jet tests one last time. Once complete, this PR will get merged.
The rerun of the Jenkins tests on Jet had one failure, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
. The run_fcst
task was failing with:
FATAL from PE 1: compute_qs: saturation vapor pressure table overflow, nbad= 1
None of the changes made in this PR will cause this issue. The use of rocotorewind/rocotoboot allowed the failed task to successfully pass:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240312211355 COMPLETE 19.64
custom_ESGgrid_20240312211357 COMPLETE 18.79
custom_ESGgrid_Great_Lakes_snow_8km_20240312211358 COMPLETE 14.27
custom_GFDLgrid_20240312211400 COMPLETE 10.02
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202403 COMPLETE 11.20
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20 COMPLETE 57.17
get_from_HPSS_ics_RAP_lbcs_RAP_20240312211404 COMPLETE 17.22
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240312211405 COMPLETE 223.35
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 40.85
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 7.38
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024 COMPLETE 496.47
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR_2024 COMPLETE 10.68
----------------------------------------------------------------------------------------------------
Total COMPLETE 927.04
Moving forward with merging this PR now.
DESCRIPTION OF CHANGES:
Jet is switching from CentOS to Rocky OS.
Type of change
TESTS CONDUCTED:
ISSUE:
Solves issue #1044
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR: