ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
55 stars 116 forks source link

[develop] Fixing several issues, including 966 (bash octal issue); add new winter weather verification test with staged data #997

Closed mkavulich closed 7 months ago

mkavulich commented 8 months ago

DESCRIPTION OF CHANGES:

This started as a fairly simple change (don't they always) to fix Issue #966 and add a new winter weather verification test in support of the RRFS agile framework project and has since snowballed into solving a number of outstanding and newly discovered issues.

New test

Resolved issues

Other fixes

General improvements

Type of change

TESTS CONDUCTED:

Ran fundamental tests on several platforms. Ran some manual tests for the above-mentioned WE2E and monitor_jobs features, including catting together several WE2E yaml files in a way that previously failed to ensure they now succeed.

Ran all verification tests (including new test) on Hera, Orion, and Derecho to ensure nothing changed unexpectedly. I also compared existing tests to develop on Hera, and confirmed that the additional NDAS obs are now being pulled and used for verification tasks, which resulted in expected differences in verification results.

DEPENDENCIES:

None

DOCUMENTATION:

Working on updates to documentation (mostly for the new test); will merge when ready

ISSUE:

CHECKLIST

CONTRIBUTORS:

Thanks to @EdwardSnyder-NOAA for help staging the new test data

MichaelLueken commented 7 months ago

The coverage WE2E tests were manually ran on Derecho and they all successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km_20240104115217                      COMPLETE              24.66
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              45.55
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024010411521  COMPLETE              45.90
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR_20240104  COMPLETE              29.86
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              18.61
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024010411522  COMPLETE              41.38
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_  COMPLETE              24.19
pregen_grid_orog_sfc_climo_20240104115227                          COMPLETE              22.15
specify_template_filenames_20240104115229                          COMPLETE              22.77
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             275.07
MichaelLueken commented 7 months ago

@mkavulich -

The MET_verification_only_vx WE2E test on Hercules failed in run_MET_PcpCombine_obs_APCP01h with the following message:

/work/noaa/epic/role-epic/jenkins/workspace/fs-srweather-app_pipeline_PR-997/hercules/scripts/exregional_run_met_pcpcombine.sh: line 140: 10#: invalid integer constant (error token is "10#")
End exregional_run_met_pcpcombine.sh at Thu Jan  4 17:28:02 UTC 2024 with error code 1 (time elapsed: 00:00:03)
mkavulich commented 7 months ago

@MichaelLueken Thanks for letting me know, can you point me to the location of Jenkins tests on Hercules? I'm not sure where to find that information, is it in the Wiki somewhere?

MichaelLueken commented 7 months ago

@mkavulich - The Jenkins workspace for your test on Hercules is:

/work/noaa/epic/role-epic/jenkins/workspace/fs-srweather-app_pipeline_PR-997/hercules/expt_dirs/MET_verification_only_vx

At the bottom of the Contributor's Guide, there are paths to the Jenkins workspaces on the various machines.

mkavulich commented 7 months ago

@MichaelLueken For the record I am getting "Permission denied" errors when I attempt to view the contents of /work/noaa/epic/role-epic/jenkins/workspace/fs-srweather-app_pipeline_PR-997/hercules/expt_dirs/MET_verification_only_vx, can that be opened up in the future? I was able to replicate the error on my own so it's not hugely urgent

MichaelLueken commented 7 months ago

@MichaelLueken For the record I am getting "Permission denied" errors when I attempt to view the contents of /work/noaa/epic/role-epic/jenkins/workspace/fs-srweather-app_pipeline_PR-997/hercules/expt_dirs/MET_verification_only_vx, can that be opened up in the future? I was able to replicate the error on my own so it's not hugely urgent

@mkavulich - Thanks for bringing this to my attention. I have opened an issue with the EPIC Platform team to see if they can make the Jenkins workspace on Hercules readable for developers outside of the epic account.

mkavulich commented 7 months ago

@MichaelLueken This PR is ready for a re-test on Hercules.

For anyone curious about the details: This was a very strange confluence of circumstances revealing an old bug where some logic in exregional_run_met_pcpcombine.sh that was only used for forecast jobs (not for observations) needed to be moved inside an if-block to avoid utilizing undefined variables (specifically, the variable ENSMEM_INDX doesn't make sense for observation tasks since they are independent of the number of ensemble members). This didn't cause an error until I introduced my Octal bug fix...and even then, for some reason this is only failing on Hercules: tests on Derecho and Hera still pass. So something specific about the bash version (or other environment) on Hercules is revealing more problems than other machines.

MichaelLueken commented 7 months ago

@mkavulich - The WE2E tests successfully passed on Hercules:

---------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE_202  COMPLETE               8.54
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202  COMPLETE              10.44
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202  COMPLETE              30.58
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              18.92
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011108  COMPLETE              26.41
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240111083  COMPLETE              54.92
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_  COMPLETE              14.94
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240111083520  COMPLETE              67.05
grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16_202401110835  COMPLETE              28.96
MET_verification_only_vx_20240111083521                            COMPLETE               0.22
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS_20240111083523               COMPLETE               8.57
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             269.55

Merging this work now.