Open EdwardSnyder-NOAA opened 1 month ago
This PR passed on AWS using the Jenkins nightly job.
When I ran comprehensive tests on Hera, I got one failing test:
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
2020_CAD_20240619182721 COMPLETE 52.22
2020_CAPE_20240619182722 COMPLETE 51.67
2019_hurricane_barry_20240619182723 COMPLETE 48.98
2019_halloween_storm_20240619182724 COMPLETE 51.43
2019_hurricane_lorenzo_20240619182724 COMPLETE 52.12
2019_memorial_day_heat_wave_20240619182725 COMPLETE 49.11
2020_denver_radiation_inversion_20240619182726 COMPLETE 50.98
2020_easter_storm_20240619182727 COMPLETE 51.28
2020_jan_cold_blast_20240619182727 COMPLETE 51.81
community_20240619182728 COMPLETE 19.92
custom_ESGgrid_20240619182729 COMPLETE 21.68
custom_ESGgrid_Central_Asia_3km_20240619182730 COMPLETE 43.42
custom_ESGgrid_Great_Lakes_snow_8km_20240619182730 COMPLETE 16.94
custom_ESGgrid_IndianOcean_6km_20240619182732 COMPLETE 25.14
custom_ESGgrid_NewZealand_3km_20240619182732 COMPLETE 101.14
custom_ESGgrid_Peru_12km_20240619182733 COMPLETE 31.04
custom_ESGgrid_SF_1p1km_20240619182734 COMPLETE 303.29
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE_202 COMPLETE 9.07
custom_GFDLgrid_20240619182735 COMPLETE 8.12
deactivate_tasks_20240619182736 COMPLETE 0.91
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me COMPLETE 1426.37
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200_2024061 COMPLETE 6.71
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200_202406 COMPLETE 10.06
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202406 COMPLETE 10.25
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20 COMPLETE 79.25
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2 COMPLETE 1458.27
get_from_HPSS_ics_GSMGFS_lbcs_GSMGFS_20240619182741 COMPLETE 7.88
get_from_HPSS_ics_HRRR_lbcs_RAP_20240619182742 COMPLETE 15.97
get_from_HPSS_ics_RAP_lbcs_RAP_20240619182743 COMPLETE 17.64
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_20240619182744 DEAD 11.84
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202 COMPLETE 12.41
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_ COMPLETE 688.04
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240 COMPLETE 247.35
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240619182747 COMPLETE 461.60
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240619182 COMPLETE 39.87
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 50.09
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024061918 COMPLETE 48.01
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202 COMPLETE 46.93
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 6.06
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 13.73
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 14.04
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024061918 COMPLETE 15.57
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240619182 COMPLETE 41.58
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202 COMPLETE 18.82
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_20240619182758 COMPLETE 10.76
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2_20240 COMPLETE 6.64
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061918280 COMPLETE 22.40
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202406191 COMPLETE 16.76
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_202406 COMPLETE 437.73
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 623.11
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_20240619 COMPLETE 584.82
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240619182 COMPLETE 699.09
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024 COMPLETE 704.18
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_ COMPLETE 42.22
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR_20240619 COMPLETE 42.12
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 42.62
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_ COMPLETE 9.98
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061 COMPLETE 39.94
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 10.96
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2 COMPLETE 524.45
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR_202406191 COMPLETE 917.47
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20 COMPLETE 922.26
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240619182813 COMPLETE 128.46
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202406 COMPLETE 47.98
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061918281 COMPLETE 27.68
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240619182 COMPLETE 24.85
grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16_202406191828 COMPLETE 35.34
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202 COMPLETE 13.41
long_fcst_20240619182821 COMPLETE 98.76
MET_ensemble_verification_only_vx_20240619182822 COMPLETE 1.04
MET_ensemble_verification_only_vx_time_lag_20240619182826 COMPLETE 3.90
MET_ensemble_verification_winter_wx_20240619182831 COMPLETE 118.22
MET_verification_only_vx_20240619182832 COMPLETE 0.23
pregen_grid_orog_sfc_climo_20240619182836 COMPLETE 7.52
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS_20240619182840 COMPLETE 7.17
specify_template_filenames_20240619182841 COMPLETE 7.84
----------------------------------------------------------------------------------------------------
Total DEAD 11968.52
I tried to rerun test several times and it always failed in forecast between hours 05 and 06.
But, when I ran that single test:
./run_WE2E_tests.py -t get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS -m hera -a epic
Test pass, but when I look into run directory, it is different day:
from comprehensive: get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS/2024061800
from single test: get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS/2024061700
@MichaelLueken @EdwardSnyder-NOAA do you have an idea what is going on here?
@EdwardSnyder-NOAA -
I have gone ahead and added the DO_NOT_MERGE
label temporarily to this PR. Please let me know when you have finished pushing changes to the PR, and I will remove the label, run the a test on Hera GNU, then move forward with final Jenkins testing. Thank you very much!
@EdwardSnyder-NOAA -
With the merging of @RatkoVasic-NOAA's PR #1093, the SRW App is now compiling and running without issue on Hera GNU.
Are there additional changes that are still required for this PR, or is it safe to remove the DO_NOT_MERGE
label and run the Jenkins tests now?
DESCRIPTION OF CHANGES:
This PR adds logic to handle GCP's default conda env, which conflicts with the SRW App's conda env. Fixes a Parallel Works naming convention bug in the script.
It also addresses a known issue with a Ruby warning on PW instances that prevents the
run_WE2E_tests.py
from exiting gracefully. The solution we use in our bootstrap for/contrib
doesn't seem to work for the/lustre
directory, which is why the warning is hardcoded into themonitor_jobs.py
script.The new spack-stack build on Azure is missing a gnu library, so added the path to this missing library to the proper run scripts and cleaned up the wflow noaacloud lua file.
Removed log and error files from the qsub wrapper script so that qsub can generate these files with the job id in the files name. Also, fixed typo in the wrapper script.
Type of change
TESTS CONDUCTED:
DEPENDENCIES:
DOCUMENTATION:
None.
ISSUE:
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@kbooker79, @BruceKropp-Raytheon