Closed gsketefian closed 8 months ago
@gsketefian - With respect to the MET and METplus bugs that you are encountering, is the issue with the SRW App, or with MET and METplus? If the issue is with MET and METplus, would it be useful at all to attempt to merge PR #969 into a test version of your feature/vx_upgrades branch to see if using a later version of MET and METplus corrects the issue you are encountering? Thanks!
@gsketefian - With respect to the MET and METplus bugs that you are encountering, is the issue with the SRW App, or with MET and METplus? If the issue is with MET and METplus, would it be useful at all to attempt to merge PR #969 into a test version of your feature/vx_upgrades branch to see if using a later version of MET and METplus corrects the issue you are encountering? Thanks!
@MichaelLueken The issue is with MET/METplus, and I heard back from METplus developers as to the reason (if interested, see this discussion). I'm now working on the most appropriate fix. I will ask whether a later version of MET/METplus may solve this (but I doubt it; I would have to ask for this change in MET/METplus, and, if approved, it would have to be included in a future version).
@JeffBeck-NOAA @michelleharrold @willmayfield @mkavulich FYI that this vx PR is now open for review. If a couple of you can take a look, that would be great. Thanks!
Looks like some great simplifying and cleanup changes...love to see a reduction of almost 3000 lines! š
I have a few questions, but since they aren't major and mostly aren't specifically related to these changes I won't hold up this PR
I didn't realize that info was available (easily?). Where can one see the line number change for a PR? There will be a much larger reduction of lines in my next PR :)
@gsketefian -
At the top of the PR, on the right hand most side, there are green numbers with a plus and red numbers with a minus. The green plus signifies the number of added lines in a PR, while the red minus represents the number of lines removed.
For this PR, I see the following in the top right side:
+1,394 ā4,133
so there were 1,394 added lines, and 4,133 removed lines in this PR.
The WE2E coverage tests were manually run on Derecho and all successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km COMPLETE 23.77
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 38.17
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 COMPLETE 44.85
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR COMPLETE 29.32
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 17.71
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 40.76
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 24.76
pregen_grid_orog_sfc_climo COMPLETE 15.86
specify_template_filenames COMPLETE 15.10
----------------------------------------------------------------------------------------------------
Total COMPLETE 250.30
@gsketefian -
At the top of the PR, on the right hand most side, there are green numbers with a plus and red numbers with a minus. The green plus signifies the number of added lines in a PR, while the red minus represents the number of lines removed.
For this PR, I see the following in the top right side:
+1,394 ā4,133
so there were 1,394 added lines, and 4,133 removed lines in this PR.
Oh right, thanks @MichaelLueken!
@JeffBeck-NOAA @RatkoVasic-NOAA @mkavulich Thanks for the reviews!
@gsketefian - All of the tests passed, with the exception of two tests on Jet:
make_ics
and make_lbcs
with terminate called after throwing an instance of 'std::bad_alloc'
error messages. I will attempt to relaunch these failed jobs manually.make_lbcs
with srun: error: s3: task 23: Bus error (core dumped)
. I will attempt to relaunch this failed job manually.The Jenkins workspace on Jet can be found: /mnt/lfs1/NAGAPE/epic/role.epic/jenkins/workspace/fs-srweather-app_pipeline_PR-973/jet/expt_dirs
.
@MichaelLueken Thanks for the update Mike. The PR doesn't touch the make_[ics|lbcs]
tasks, so hopefully those are just one-time jet-specific issues.
The two tests that had failed on Jet - get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h
and get_from_HPSS_ics_RAP_lbcs_RAP
- have successfully completed following the use of rocotorewind and rocotoboot:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community COMPLETE 41.46
custom_ESGgrid COMPLETE 50.50
custom_ESGgrid_Great_Lakes_snow_8km COMPLETE 36.93
custom_GFDLgrid COMPLETE 32.32
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018 COMPLETE 30.57
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h COMPLETE 50.94
get_from_HPSS_ics_RAP_lbcs_RAP COMPLETE 19.08
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 243.68
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 60.16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 COMPLETE 20.82
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta COMPLETE 531.87
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR COMPLETE 18.01
----------------------------------------------------------------------------------------------------
Total COMPLETE 1136.34
@gsketefian - Given that @christinaholtNOAA's PR #994 was approved and tested first, I merged that PR first. Changes were made to to the ex-scripts to transition to UW's CLI command line tool, which kicked off conflicts in these scripts in your branch. Please merge the current authoritative develop into your feature/vx_upgrades
branch as soon as possible, address the conflicts in the ex-scripts, then I will complete the merge of this PR. Thank you very much!
@gsketefian -
While attempting to run one last batch of verification tests, specifically running @mkavulich's new MET_ensemble_verification_winter_wx
WE2E verification test, the VX_FIELDS
in tests/WE2E/test_configs/verification/config.MET_ensemble_verification_winter_wx.yaml
needs to be updated to use VX_FIELDS: [ "APCP", "REFC", "RETOP", "ADPSFC", "ADPUPA", "ASNOW" ]
, rather than VX_FIELDS: [ "APCP", "REFC", "RETOP", "SFC", "UPA", "ASNOW" ]
. Once this minor modification is made and my final tests are complete, I will move forward with merging this PR. Thanks!
@MichaelLueken I encountered those problems as well with test MET_ensemble_verification_winter_wx
. Several ASNOW
tasks were failing, and, besides the change to config.MET_ensemble_verification_winter_wx.yaml
that you pointed out, it was for the most part a matter of adding the accumulation to the variable name in the ASNOW
METplus conf files, e.g. changing
FCST_VAR1_NAME = {{fieldname_in_met_output}}
to
FCST_VAR1_NAME = {{fieldname_in_met_output}}_{{accum_hh}}
I made this change in GenEnsProd_ASNOW.conf
, EnsembleStat_ASNOW.conf
, GridStat_ensmean_ASNOW.conf
, and GridStat_ensprob_ASNOW.conf
.
However, I also found a stealthy bug in GridStat_ensprob_ASNOW.conf
that changes results (and which @willmayfield will probably be interested in). The issue was an inadvertent shift in the threshold values used in the forecast field array names with respect to the threshold values specified for the observations. For example, for VAR2
, the buggy code is
FCST_VAR2_NAME = {{fieldname_in_met_output}}_{{accum_hh}}_A{{accum_no_pad}}_ENS_FREQ_gt0.0
...
OBS_VAR2_THRESH = ge0.508
What it should be is:
FCST_VAR2_NAME = {{fieldname_in_met_output}}_{{accum_hh}}_A{{accum_no_pad}}_ENS_FREQ_ge0.508
...
OBS_VAR2_THRESH = ge0.508
So I think the thresholds for the obs and forecasts were not matching. So although the run_MET_GridStat_vx_ensprob_ASNOW06h
task succeeds in the develop
branch, I think the results are incorrect. I think I've fixed the issue. @willmayfield if you're interested in taking a look at the results of this test (after I push my latest changes), please let me know and we can wait for you to take a look before merging.
I'm rerunning the test now to make sure it works from scratch and will then push my fixes. Thanks, Gerard
@MichaelLueken @willmayfield I reran the MET_ensemble_verification_winter_wx
with my newest version, and it was successful. I've also done regression tests on this test as well as MET_ensemble_verification_only_vx
and custom_ESGgrid_Great_Lakes_snow_8km
. All have only expected differences in the vx output.
Please feel free to retest and merge. Thanks.
@gsketefian - Here is the current update on the retesting for this PR:
The WE2E coverage tests on Gaea have completed successfully:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240112103959 COMPLETE 23.22
custom_ESGgrid_NewZealand_3km_20240112104004 COMPLETE 64.46
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 34.92
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240112104 COMPLETE 31.97
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011210 COMPLETE 33.87
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 357.80
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024011 COMPLETE 33.36
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20 COMPLETE 363.78
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202 COMPLETE 10.55
nco_ensemble_20240112104015 COMPLETE 78.47
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 351.98
----------------------------------------------------------------------------------------------------
Total COMPLETE 1384.38
The WE2E coverage tests on Gaea C5 have completed successfully:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240112104016 COMPLETE 43.13
custom_ESGgrid_NewZealand_3km_20240112104024 COMPLETE 48.67
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 27.85
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240112104 COMPLETE 30.65
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011210 COMPLETE 31.93
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 313.32
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024011 COMPLETE 30.43
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20 COMPLETE 272.79
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202 COMPLETE 16.73
nco_ensemble_20240112104043 COMPLETE 96.57
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 304.58
----------------------------------------------------------------------------------------------------
Total COMPLETE 1216.65
The WE2E coverage tests on Hera GNU have completed successfully:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Central_Asia_3km_20240112155348 COMPLETE 36.65
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200_202401 COMPLETE 12.85
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_20240112155352 COMPLETE 20.08
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011215 COMPLETE 45.85
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202 COMPLETE 30.48
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240112155 COMPLETE 20.99
long_fcst_20240112155402 COMPLETE 95.20
MET_verification_only_vx_20240112155405 COMPLETE 0.25
MET_ensemble_verification_only_vx_time_lag_20240112155410 COMPLETE 8.98
nco_grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202 COMPLETE 63.53
----------------------------------------------------------------------------------------------------
Total COMPLETE 334.86
The WE2E coverage tests on Hera Intel have completed successfully:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km_20240112155349 COMPLETE 18.60
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200_2024011 COMPLETE 6.77
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2 COMPLETE 789.24
get_from_HPSS_ics_HRRR_lbcs_RAP_20240112155354 COMPLETE 14.18
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 6.55
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 13.08
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_20240112155405 COMPLETE 10.46
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2_20240 COMPLETE 7.13
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_202401 COMPLETE 240.04
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_20240112 COMPLETE 343.84
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR_202401121 COMPLETE 332.25
pregen_grid_orog_sfc_climo_20240112155414 COMPLETE 8.33
----------------------------------------------------------------------------------------------------
Total COMPLETE 1790.47
The WE2E coverage tests on Hercules have completed successfully:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE_202 COMPLETE 7.23
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202 COMPLETE 10.36
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202 COMPLETE 27.77
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 16.63
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011209 COMPLETE 25.20
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240112091 COMPLETE 52.97
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_ COMPLETE 13.31
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240112091331 COMPLETE 68.37
grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16_202401120913 COMPLETE 29.07
MET_verification_only_vx_20240112091333 COMPLETE 0.23
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS_20240112091334 COMPLETE 7.74
----------------------------------------------------------------------------------------------------
Total COMPLETE 258.88
The tests are still running on both Jet and Orion.
The WE2E coverage tests have successfully passed on Jet:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community_20240112203333 COMPLETE 19.12
custom_ESGgrid_20240112203338 COMPLETE 27.94
custom_ESGgrid_Great_Lakes_snow_8km_20240112203339 COMPLETE 18.86
custom_GFDLgrid_20240112203344 COMPLETE 19.11
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202401 COMPLETE 11.38
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20 COMPLETE 52.60
get_from_HPSS_ics_RAP_lbcs_RAP_20240112203349 COMPLETE 17.85
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240112203350 COMPLETE 247.62
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20 COMPLETE 50.02
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 16.22
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024 COMPLETE 521.74
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR_2024 COMPLETE 11.71
----------------------------------------------------------------------------------------------------
Total COMPLETE 1014.17
Still awaiting completion on Orion.
@MichaelLueken @gsketefian I tried it again and everything worked fine! I'm good with the changes.
I was worried that something was wrong with these results, but I now know that the problem was the model/physics giving unrealistic results on this test case, and not something due to this PR.
The WE2E coverage tests have successfully passed on Orion:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_SF_1p1km_20240113115145 COMPLETE 170.58
deactivate_tasks_20240113115150 COMPLETE 1.35
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me COMPLETE 918.85
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_ COMPLETE 262.32
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240 COMPLETE 141.35
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202401131 COMPLETE 16.29
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240113115 COMPLETE 409.75
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_ COMPLETE 30.79
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2 COMPLETE 280.11
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202401 COMPLETE 15.15
nco_20240113115203 COMPLETE 7.87
2020_CAD_20240113115205 COMPLETE 35.60
----------------------------------------------------------------------------------------------------
Total COMPLETE 2290.01
Given @willmayfield's continued approval after retesting these changes, I will now move forward with merging this PR.
@willmayfield @MichaelLueken Thanks for working on this!
DESCRIPTION OF CHANGES:
This PR cleans up and simplifies the verification tasks in the SRW App. Main changes:
GridStat_ensprob_ASNOW.conf
. There is an inadvertent shift in the threshold values used in the forecast field array names with respect to the threshold values specified for the observations. Fix to make thresholds for forecast and obs match.Type of change
TESTS CONDUCTED:
The set of fundamental WE2E tests as well as all the verification tests were run on Hera with Intel. All completed successfully. The fundamental tests are:
The verification tests are:
Manual regression tests were also run on the following WE2E tests:
All had minor expected differences in results relative to the
develop
branch. There was a major difference in output (stat files) from therun_MET_GridStat_vx_ensprob_ASNOW06h
task of theMET_ensemble_verification_winter_wx
, but that is due to the bug fix inGridStat_ensprob_ASNOW.conf
regarding the mismatch between forecast and obs thresholds (and is thus expected).DEPENDENCIES:
None
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@michelleharrold @JeffBeck-NOAA @willmayfield