Closed natalie-perlin closed 1 year ago
Comprehensive tests seem to be looking good. Those that died expected to fail (no HPSS). Full summary log attached.
Took 3:26:06.635064; will no longer monitor.
All 73 experiments finished
Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
2020_CAD COMPLETE 38.25
community COMPLETE 46.13
custom_ESGgrid COMPLETE 15.95
custom_ESGgrid_Central_Asia_3km COMPLETE 35.70
custom_ESGgrid_Great_Lakes_snow_8km DEAD 13.15
custom_ESGgrid_IndianOcean_6km COMPLETE 25.52
custom_ESGgrid_NewZealand_3km COMPLETE 51.68
custom_ESGgrid_Peru_12km COMPLETE 25.43
custom_ESGgrid_SF_1p1km COMPLETE 167.38
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE COMPLETE 11.58
custom_GFDLgrid COMPLETE 10.67
deactivate_tasks COMPLETE 1.01
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me COMPLETE 694.32
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200 DEAD 1.14
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200 DEAD 1.13
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018 DEAD 1.02
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h DEAD 1.14
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2 DEAD 3.48
get_from_HPSS_ics_GSMGFS_lbcs_GSMGFS DEAD 1.02
get_from_HPSS_ics_HRRR_lbcs_RAP DEAD 1.19
get_from_HPSS_ics_RAP_lbcs_RAP DEAD 1.18
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS COMPLETE 24.16
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 17.58
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta COMPLETE 243.52
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 131.36
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 172.23
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP COMPLETE 32.53
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 36.26
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 32.18
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta COMPLETE 32.36
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 COMPLETE 12.03
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 27.34
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 26.67
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 38.22
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP COMPLETE 79.05
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta COMPLETE 41.16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP COMPLETE 20.50
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2 COMPLETE 13.55
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 COMPLETE 48.99
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta COMPLETE 38.41
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 COMPLETE 243.81
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 323.80
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 334.41
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 348.09
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta COMPLETE 348.45
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 33.56
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR COMPLETE 31.57
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 28.72
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 20.34
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 35.89
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 19.21
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 264.82
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR COMPLETE 279.25
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 281.13
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP COMPLETE 79.50
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0 COMPLETE 34.69
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 42.44
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0 COMPLETE 33.70
grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16 COMPLETE 53.67
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot COMPLETE 18.88
long_fcst COMPLETE 76.94
MET_ensemble_verification_only_vx COMPLETE 1.19
MET_ensemble_verification_only_vx_time_lag DEAD 0.17
MET_verification_only_vx COMPLETE 0.27
nco COMPLETE 22.78
nco_ensemble COMPLETE 113.79
nco_grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 32.95
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 26.32
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 319.18
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR DEAD 1.53
pregen_grid_orog_sfc_climo COMPLETE 13.37
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS COMPLETE 14.48
specify_template_filenames COMPLETE 16.82
----------------------------------------------------------------------------------------------------
Total DEAD 5711.89
Detailed summary written to /lustre/f2/scratch/ncep/Natalie.Perlin/C5/SRW/expt_dirs/spack141_comprehensive/WE2E_summary_20231013005910.txt
One test failed on Orion:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_SF_1p1km COMPLETE 157.84
deactivate_tasks COMPLETE 1.03
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me DEAD 43.47
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta COMPLETE 256.88
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 138.20
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta COMPLETE 14.62
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 369.80
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 28.00
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 273.23
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0 COMPLETE 14.43
nco COMPLETE 7.74
2020_CAD COMPLETE 33.38
----------------------------------------------------------------------------------------------------
Total DEAD 1338.62
Rerunning the failed get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2mems
succeeded:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me COMPLETE 748.46
----------------------------------------------------------------------------------------------------
Total COMPLETE 748.46
Awaiting completion of tests on Gaea, Gaea C5, Hera, and Jet.
@MichaelLueken - is there anything else to be completed before merging?..
@natalie-perlin - Tests are still running on Gaea and Gaea C5. Tests were hanging on the machine, but have moved into the Testing phase this morning. Once complete, this PR can be merged.
@natalie-perlin @ulmononian - The version of spack-stack used in this PR is different from the stack used in the weather model's PR #1784. Is there a plan to move the weather model to the version in this PR, or will the SRW need to move to use the weather model's version?
Gaea C5 WE2E coverage tests were manually run and all passed successfully:
---------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community COMPLETE 45.29
custom_ESGgrid_NewZealand_3km COMPLETE 50.97
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 28.77
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP COMPLETE 32.87
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 34.10
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 314.82
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 36.02
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 278.01
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot COMPLETE 18.73
nco_ensemble COMPLETE 113.63
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 312.32
----------------------------------------------------------------------------------------------------
Total COMPLETE 1265.53
Working through some issues that were experienced on Hera. Once the tests finish, I can move forward with this work.
@jkbk2004 would like to ensure that both @natalie-perlin and @ulmononian are on the same page with respect to the different versions of the spack-stacks that are used for SRW and the weather model.
@natalie-perlin @ulmononian - The version of spack-stack used in this PR is different from the stack used in the weather model's PR #1784. Is there a plan to move the weather model to the version in this PR, or will the SRW need to move to use the weather model's version?
i am not sure where the stack used in this PR came from. it was not an official installation by the spack-stack group as far as i can recall. @natalie-perlin is there a reason why this PR does not use /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-dev-20230717
? i suppose the most obvious point of convergence would be when both move to spack-stack/1.5.0, but that timeline is not yet clear.
Thanks, @ulmononian!
@natalie-perlin - If this PR isn't using the official spack-stack from the spack-stack team, then I think we should hold off. We were originally planning on supporting hpc-stack for Derecho, Gaea C5, generic Linux, and MacOS. It's probably not a good idea to use an unsupported stack on Gaea C5, especially for a community release. We can hold off until we transfer to spack-stack/1.5.0
. I'd like to check and see how everyone else feels about this. Thanks!
Hera GNU WE2E coverage tests have completed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km COMPLETE 26.79
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200 COMPLETE 11.75
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS COMPLETE 18.26
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 47.52
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta COMPLETE 29.04
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0 COMPLETE 21.31
long_fcst COMPLETE 71.06
MET_verification_only_vx COMPLETE 0.25
MET_ensemble_verification_only_vx_time_lag COMPLETE 8.00
nco_grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 62.49
----------------------------------------------------------------------------------------------------
Total COMPLETE 296.47
Hera Intel WE2E tests have completed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Central_Asia_3km DEAD 7.22
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200 COMPLETE 6.46
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2 COMPLETE 763.00
get_from_HPSS_ics_HRRR_lbcs_RAP COMPLETE 14.16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 COMPLETE 6.36
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 13.40
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP COMPLETE 10.30
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2 COMPLETE 6.68
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 COMPLETE 232.64
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 COMPLETE 304.83
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR COMPLETE 327.89
pregen_grid_orog_sfc_climo COMPLETE 7.24
----------------------------------------------------------------------------------------------------
Total DEAD 1700.18
Rerun of the custom_ESGgrid_Central_Asia_3km
test passes:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Central_Asia_3km COMPLETE 26.87
----------------------------------------------------------------------------------------------------
Total COMPLETE 26.87
@ulmononian @MichaelLueken - this is a spack-stack version that used the same compiler version as the previous SRW version (intel-2023.1.0). Dom Heinzeller suggested the location of this stack version to install in a standard EPIC location: /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-1.4.1/envs/unified-env-intel-2023.1.0
, and to keep alongside of the previous one that used intel-2022.0.2: /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-1.4.1/envs/unified-env
The last one was the stack that Ratko tested in the early stages of the PR-913 https://github.com/ufs-community/ufs-srweather-app/pull/913 and it did not work for SRW.
@ulmononian @MichaelLueken - this is a spack-stack version that used the same compiler version as the previous SRW version (intel-2023.1.0). Dom Heinzeller suggested the location of this stack version to install in a standard EPIC location: /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-1.4.1/envs/unified-env-intel-2023.1.0
, and to keep alongside of the previous one that used intel-2022.0.2: /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-1.4.1/envs/unified-env
The last one was the stack that Ratko tested in the early stages of the PR-913 #913 and it did not work for SRW.
did anyone try /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-dev-20230717/unified-env
(@natalie-perlin @RatkoVasic-NOAA)? this came as a bug fix for /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-1.4.1/envs/unified-env
.
anyway, if the only difference between /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-dev-20230717/unified-env
and /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-1.4.1/envs/unified-env-intel-2023.1.0
is the compiler version, it is probably ok to use it here (given that c5 is not added to the ufs-wm yet anyway).
The Gaea tests have successfully completed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community COMPLETE 32.69
custom_ESGgrid_NewZealand_3km COMPLETE 78.02
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 37.26
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP COMPLETE 35.92
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 41.59
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 417.23
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 45.24
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 388.27
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot COMPLETE 14.57
nco_ensemble COMPLETE 145.50
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 362.47
----------------------------------------------------------------------------------------------------
Total COMPLETE 1598.76
@natalie-perlin and @ulmononian - The tests have completed on all platforms. It looks like @ulmononian is okay to move forward with merging this work for the release (since the only difference is the compiler version). I will go ahead and merge this PR now. Thank you both for the discussion in this PR!
DESCRIPTION OF CHANGES:
Integrate spack-stack/1.4.1 modules into Gaea C5 platform. Spack-stack is based on intel-classic/2023.1.0 compiler and cray-mpich/8.1.25; stack environment built is /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-1.4.1/envs/unified-env-intel-2023.1.0/
Fundamental tests pass completely:
A detailed summary of log WE2E_summary_20231012211458.txt is attached. In comprehensive tests suite, some of the failures do occur, in tasks such as
get_obs_<xxxx>
,get_extrn_ics
,get_extrn_lbcs
; comprehensive tests are still running.Files changed: modulefiles/build_gaea-c5_intel.lua (unload modules darshan-runtime, cray-pmi) modulefiles/wflow_gaea-c5.lua modulefiles/tasks/gaea-c5/python_srw.lua (loads module darshan-runtime) modulefiles/tasks/gaea-c5/run_vx.local.lua
Type of change
TESTS CONDUCTED:
DEPENDENCIES:
ISSUE:
Resolves the issue
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@RatkoVasic-NOAA
WE2E_summary_20231012211458.txt