Closed natalie-perlin closed 5 months ago
All the comprehensive tests pass on Gaea -
All 63 experiments finished
Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
2020_CAD_20240227100652 COMPLETE 32.92
community_20240227100655 COMPLETE 42.97
custom_ESGgrid_20240227100657 COMPLETE 13.16
custom_ESGgrid_Central_Asia_3km_20240227100658 COMPLETE 32.93
custom_ESGgrid_IndianOcean_6km_20240227100700 COMPLETE 22.37
custom_ESGgrid_NewZealand_3km_20240227100702 COMPLETE 47.09
custom_ESGgrid_Peru_12km_20240227100703 COMPLETE 21.82
custom_ESGgrid_SF_1p1km_20240227100705 COMPLETE 166.96
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE_202 COMPLETE 9.32
custom_GFDLgrid_20240227100708 COMPLETE 8.34
deactivate_tasks_20240227100710 COMPLETE 0.85
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me COMPLETE 689.14
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_20240227100713 COMPLETE 21.15
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202 COMPLETE 14.99
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_ COMPLETE 241.83
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240 COMPLETE 129.31
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240227100721 COMPLETE 167.76
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240227100 COMPLETE 30.20
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 33.87
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024022710 COMPLETE 29.77
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202 COMPLETE 30.24
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 11.41
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 25.24
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 25.02
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024022710 COMPLETE 38.42
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240227100 COMPLETE 70.09
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202 COMPLETE 40.73
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_20240227100741 COMPLETE 18.57
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2_20240 COMPLETE 11.38
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024022710074 COMPLETE 45.38
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202402271 COMPLETE 36.33
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_202402 COMPLETE 234.79
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 312.53
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_20240227 COMPLETE 323.93
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240227100 COMPLETE 356.07
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024 COMPLETE 363.14
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_ COMPLETE 30.57
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR_20240227 COMPLETE 27.70
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 26.48
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_ COMPLETE 18.91
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024022 COMPLETE 30.78
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 17.99
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2 COMPLETE 259.37
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR_202402271 COMPLETE 277.35
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20 COMPLETE 280.78
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240227100814 COMPLETE 75.78
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202402 COMPLETE 32.47
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024022710081 COMPLETE 39.80
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240227100 COMPLETE 32.00
grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16_202402271008 COMPLETE 51.82
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202 COMPLETE 16.51
MET_ensemble_verification_only_vx_20240227100826 COMPLETE 1.02
MET_ensemble_verification_winter_wx_20240227100830 COMPLETE 201.20
MET_verification_only_vx_20240227100833 COMPLETE 0.21
nco_20240227100837 COMPLETE 21.30
nco_ensemble_20240227100840 COMPLETE 100.75
nco_grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202 COMPLETE 30.83
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 25.31
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 314.11
@kbooker79 @jkbk2004 - please see @MichaelLueken comment above
The way current SRW tests and Jenkins tests are developed, the SRW requires to have the platform name consistent with the name for Jenkins, which is "gaeac5" The file .cicd/Jenkinsfile and the .cicd/scripts/*.sh are dependent on the SRW_PLATFORM entry. This entry is the Jenkins label.
As we do not have any other "gaea" platform, and there is no need to differentiate between Gaea C4 ("gaea") and Gaea C5 ("gaeac5") for Jenkins tests, could the platform label in Jenkins be changed to just "gaea"?
@kbooker79 and @jkbk2004 -
The SRW App's .cicd/Jenkinsfile
and .cicd/scripts/*.sh
tests heavily utilize env.SRW_PLATFORM
in Jenkins. To ensure that we don't encroach on Orion/Hercules, all workspaces require an additional dir ("${env.SRW_PLATFORM}")
step in the stages. In the srw_build
and srw_test
scripts, SRW_PLATFORM
is passed to tests/build.sh
(which is then used for choosing the build modulefile) and passed to tests/WE2E/setup_WE2E_tests.py
(which is then used to choose the wflow modulefile and task modulefiles). To move to gaea, this will require a rework of the entire Jenkinsfile and Jenkins test scripts, which go beyond the scope of this PR.
@kbooker79 @jkbk2004 - please see @MichaelLueken comment above
The way current SRW tests and Jenkins tests are developed, the SRW requires to have the platform name consistent with the name for Jenkins, which is "gaeac5" The file .cicd/Jenkinsfile and the .cicd/scripts/*.sh are dependent on the SRW_PLATFORM entry. This entry is the Jenkins label.
As we do not have any other "gaea" platform, and there is no need to differentiate between Gaea C4 ("gaea") and Gaea C5 ("gaeac5") for Jenkins tests, could the platform label in Jenkins be changed to just "gaea"?
@natalie-perlin, I suppose we can do that but we'll have to do some test with MRW (ufs-weather-model) pipelines to ensure that everything still works
@natalie-perlin -
I'll try and make changes to the .cicd/scripts/srw_build.sh
, wrapper_srw_ftest.sh
, and srw_test.sh
scripts to allow them to work with gaea
. I'll let you know how this work turns out and then we can move forward from there.
@natalie-perlin -
I was able to have the SRW App Jenkins scripts set the Gaea C5 platform as gaea
, allowing the current build
, ftest
, and test
scripts to run on Gaea C5 using gaea
modulefiles and entries. I have opened PR #10 in your fork with these significantly reduced changes. Once they have been merged, I will approve this PR.
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240227152426 COMPLETE 42.69
custom_ESGgrid_NewZealand_3km_20240227152428 COMPLETE 48.03
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 26.79
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240227152 COMPLETE 29.81
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024022715 COMPLETE 30.68
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 314.00
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024022 COMPLETE 29.91
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20 COMPLETE 278.04
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202 COMPLETE 16.51
nco_ensemble_20240227152441 COMPLETE 96.21
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 312.42
----------------------------------------------------------------------------------------------------
Total COMPLETE 1225.09
The Jenkins tests have successfully passed on Derecho, Hera GNU, Hercules, and Orion.
On Jet, the custom_ESGgrid_Great_Lakes_snow_8km
and get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf
failed. Using rocotorewind/rocotoboot, these tests ahve successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240228075903 COMPLETE 19.44
custom_ESGgrid_20240228075908 COMPLETE 19.78
custom_ESGgrid_Great_Lakes_snow_8km_20240228075912 COMPLETE 15.96
custom_GFDLgrid_20240228075917 COMPLETE 10.92
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202402 COMPLETE 13.67
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20 COMPLETE 61.07
get_from_HPSS_ics_RAP_lbcs_RAP_20240228075928 COMPLETE 18.88
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240228075930 COMPLETE 245.92
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 41.53
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 9.24
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024 COMPLETE 549.41
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR_2024 COMPLETE 10.41
----------------------------------------------------------------------------------------------------
Total COMPLETE 1016.23
On Hera Intel, the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
test failed. The test is being rerun using rocotorewind/rocotoboot. With no allocation on the machine currently, it will likely take all day for this test to successfully complete.
@kbooker79 was able to restart the Jenkins runner on Gaea C5 and the Jenkins tests have successfully cloned the external repositories and have moved onto the Build stage. I will let you know if there are any issues on the machine.
By utilizing the Rocky8 nodes on Hera, the rerun of the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
test successfully completed very quickly:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km_20240228130055 COMPLETE 17.47
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200_2024022 COMPLETE 5.96
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2 COMPLETE 787.75
get_from_HPSS_ics_HRRR_lbcs_RAP_20240228130059 COMPLETE 14.00
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 7.74
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 13.19
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_20240228130103 COMPLETE 9.91
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2_20240 COMPLETE 6.59
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_202402 COMPLETE 232.70
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_20240228 COMPLETE 304.39
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR_202402281 COMPLETE 325.62
pregen_grid_orog_sfc_climo_20240228130110 COMPLETE 8.28
----------------------------------------------------------------------------------------------------
Total COMPLETE 1733.60
Once the Gaea tests complete, I will move forward with merging this work (the Build stage has successfully completed and the Functional Workflow Task Tests stage is now running).
The Gaea C5 tests successfully completed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240229103008 COMPLETE 43.17
custom_ESGgrid_NewZealand_3km_20240229103010 COMPLETE 47.87
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 26.67
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240229103 COMPLETE 28.79
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024022910 COMPLETE 29.17
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 315.80
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024022 COMPLETE 30.42
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20 COMPLETE 277.45
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202 COMPLETE 16.48
nco_ensemble_20240229103023 COMPLETE 95.93
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 319.81
2020_CAPE_20240229103029 COMPLETE 36.08
----------------------------------------------------------------------------------------------------
Total COMPLETE 1267.64
While queuing the Gaea tests, the Hera tests were also queued. These tests were aborted.
Moving forward with merging this PR now.
DESCRIPTION OF CHANGES:
A solution to solve library conflict for libstdc++.so.6 was to preload a specific library during a runtime, as specified in ./modulefiles/wflow_gaea.lua , ./modulefiles/tasks/gaea/python_srw.lua:
setenv("LD_PRELOAD", "/opt/cray/pe/gcc/12.2.0/snos/lib64/libstdc++.so.6")
Type of change
TESTS CONDUCTED:
Conducted fundamental tests on Gaea (c5), all pass
DEPENDENCIES:
DOCUMENTATION:
ISSUE:
https://github.com/ufs-community/ufs-srweather-app/issues/991
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
ADDITIONAL NOTES:
A summary after running the fundamental test suite:
Comprehensive tests pass successfully, a log file WE2E_tests_20240227100902.yaml attached WE2E_tests_20240227100902.yaml.txt