ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
53 stars 114 forks source link

[develop] Upgrade SRW to spack-stack 1.6.0 from 1.5.1 #1093

Closed RatkoVasic-NOAA closed 3 weeks ago

RatkoVasic-NOAA commented 1 month ago

DESCRIPTION OF CHANGES:

As ufs-weather-model was upgraded to spack-stack 1.6.0, we are upgrading SRW as well.

Type of change

TESTS CONDUCTED:

ISSUE:

Issue #1092

CHECKLIST

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

CONTRIBUTORS (optional):

@natalie-perlin

RatkoVasic-NOAA commented 1 month ago

Fundamental tests. HERA:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE               9.87
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.14
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              15.01
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              36.00
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610202  COMPLETE              24.35
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061020231  COMPLETE              21.66
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             115.03

GAEA:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              18.83
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              13.41
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              29.85
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              38.00
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610224  COMPLETE              35.28
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061022410  COMPLETE              49.54
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             184.91

ORION:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              13.03
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               9.71
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              18.47
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              45.96
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610174  COMPLETE              31.41
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061017454  COMPLETE              24.67
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             143.25

HERCULES:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              13.30
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              11.07
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              32.29
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              36.33
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610152  COMPLETE              69.53
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061015212  COMPLETE              42.97
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             205.49

DERECHO:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              19.34
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              18.90
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              32.63
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              34.49
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610142  COMPLETE              33.22
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061014204  COMPLETE              48.32
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             186.90
chan-hoo commented 1 month ago

AQM test on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
aqm_grid_AQM_NA13km_suite_GFS_v16_20240611123548                   COMPLETE            5299.42
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            5299.42

Approving.

MichaelLueken commented 1 month ago

@RatkoVasic-NOAA -

Given that Jet /lfs4 is still down, Derecho being down for maintenance today, and Orion undergoing the OS migration tomorrow and Thursday, I will hold off on automated testing until spack-stack 1.6.0 is ready on Orion Rocky 9. By then, Jet should hopefully be back, as well as Derecho.

RatkoVasic-NOAA commented 1 month ago

@RatkoVasic-NOAA -

Given that Jet /lfs4 is still down, Derecho being down for maintenance today, and Orion undergoing the OS migration tomorrow and Thursday, I will hold off on automated testing until spack-stack 1.6.0 is ready on Orion Rocky 9. By then, Jet should hopefully be back, as well as Derecho.

OK, their guess is that they will upgrade Orion by 6/12-13, which means we can start building libraries 6/14. Since there are plenty of versions and different environments, I will start with spack-stack 1.6.0

MichaelLueken commented 1 month ago

With the return of Jet /lfs4 yesterday afternoon, the fundamental tests were run and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE               9.22
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               7.44
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              13.92
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              37.29
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240613144  COMPLETE              28.96
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061314444  COMPLETE              19.29
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             116.12
EdwardSnyder-NOAA commented 4 weeks ago

Successfully built and ran the fundamental test suite on AWS, Azure, and GCP using spack-stack v1.6.0. Had some issues running the tests on Azure. Jobs would just sit in the queue for hours without running or they would fail with this mpi error message: OFI get address vector map failed. However, I believe it to be an issue with PW configuration because if you shutdown the instance and restart it, jobs would start submitting and passing without that mpi error message.

GCP:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              12.93
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               6.50
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              12.90
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              32.34
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240614192  COMPLETE              19.71
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061419272  COMPLETE              19.34
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             103.72

Detailed summary written to /contrib/Edward.Snyder/ss160/expt_dirs/WE2E_summary_20240614195958.txt

AWS:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE             137.10
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              19.22
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              70.17
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE             283.56
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240614193  COMPLETE              80.22
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061419305  COMPLETE             152.68
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             742.95

Detailed summary written to /contrib/Edward.Snyder/ss160/expt_dirs/WE2E_summary_20240614212234.txt

Azure:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              41.10
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              10.31
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              21.96
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              82.61
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240617152  COMPLETE              38.43
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061715211  COMPLETE              42.68
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             237.09

Detailed summary written to /contrib/Edward.Snyder/ss160/expt_dirs/WE2E_summary_20240618190711.txt
RatkoVasic-NOAA commented 3 weeks ago

@MichaelLueken with last addition of Orion modulefiles (although cannot be tested for some time), I think this PR is ready for final testing.

MichaelLueken commented 3 weeks ago

Thanks, @RatkoVasic-NOAA! I'll launch Jenkins tests for this PR now.

MichaelLueken commented 3 weeks ago

There are issues with Jenkins on Orion following the OS migration and software stack update. Jenkins is attempting to use /apps/git-2.28.0/bin/git to clone repositories. However, /apps/git-2.28.0/bin/git doesn't exist, leading to failure (it should be pointing to /usr/bin/git).

Manual runs of the Orion coverage tests have successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_SF_1p1km_20240621082732                             COMPLETE             444.16
deactivate_tasks_20240621082733                                    COMPLETE               1.18
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me  COMPLETE            1984.40
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_  COMPLETE            1029.93
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240  COMPLETE             388.84
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202406210  COMPLETE              22.77
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240621082  COMPLETE             993.22
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_  COMPLETE              63.58
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2  COMPLETE             763.58
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202406  COMPLETE              65.77
2020_CAD_20240621082749                                            COMPLETE              72.16
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            5829.59

and the SRW Metrics test also successfully passed:

Skill Score: 0.99807
+ [[ 0.99807 < 0.700 ]]
Congrats! You pass check!

Will now move forward with merging this PR.