Closed gsketefian closed 2 months ago
@gsketefian I ran tests on Hera. It failed for some tests (that passed for you before): /scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/WE2E_summary_20240509175500.txt
/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/MET_ensemble_verification_only_vx_time_lag/log/get_obs_ndas_2021050500.log
Can you please take a look, in second log fie, it is trying to get from HPSS file that does not exist.
@RatkoVasic-NOAA -
I saw similar issues with the MET_ensemble_verification_only_vx_time_lag
and MET_verification_winter_wx
WE2E tests. I believe that these were unavailable during the HPSS library maintenance today. Once the maintenance concluded, I rocotorewind/rocotoboot the get_verif_obs
tasks and they successfully passed. Please try rerunning these tasks/tests now and see if they continue to fail.
@MichaelLueken @RatkoVasic-NOAA I just pushed a few more commits that further streamline the METplus config templates. I reran the test MET_ensemble_verification_only_vx
, and it worked. This test checks all the latest changes, so the rest of the tests should work fine. Go ahead and retest at your convenience. Thanks.
After two tries, it passed on Hera:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
MET_ensemble_verification_20240510145323 COMPLETE 21.52
MET_ensemble_verification_only_vx_20240510145325 COMPLETE 1.04
MET_ensemble_verification_only_vx_time_lag_20240510145328 COMPLETE 3.90
MET_ensemble_verification_winter_wx_20240510145331 COMPLETE 112.10
MET_verification_20240510145332 COMPLETE 10.12
MET_verification_only_vx_20240510145334 COMPLETE 0.23
MET_verification_winter_wx_20240510145335 COMPLETE 16.54
----------------------------------------------------------------------------------------------------
Total COMPLETE 165.45
Detailed summary written to /scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/WE2E_summary_20240510152858.txt
@gsketefian , can you please check what is going on with this one:
/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/MET_ensemble_verification_only_vx_time_lag/log/run_MET_GenEnsProd_vx_RETOP_2021050500.log
/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/MET_ensemble_verification_only_vx_time_lag/log/run_MET_GenEnsProd_vx_RETOP_2021050500.log.0
It is failing randomly. Yesterday it failed twice in a row causing rocoto to give up and whole test to fail. Here is same and additional test that failed more than once:
Hera:/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs-old/MET_ensemble_verification_only_vx_time_lag/log>
run_MET_GenEnsProd_vx_REFC_2021050500.log
run_MET_GenEnsProd_vx_REFC_2021050500.log.0
run_MET_GenEnsProd_vx_RETOP_2021050500.log
run_MET_GenEnsProd_vx_RETOP_2021050500.log.0
I wonder if you have an idea what might be causing this random occurrence?
@RatkoVasic-NOAA I think I figured out the problem.
What's happening is that during the run_MET_GenEnsProd_vx_REFC
task (and also for RETOP
, but I'll just talk about REFC
for simplicity), in exregional_run_met_genensprod_or_ensemblestat.sh
the call to set_vx_fhr_list
is failing to find some or all of the obs files because the task get_obs_mrms
has not yet generated these files (because it hasn't launched yet, or has not yet completed). But these obs files aren't necessary for running run_MET_GenEnsProd_vx_REFC
because GenEnsProd only takes forecast files as inputs. The reason this check exists is that this ex-script is also used for the EnsembleStat tasks, which do take observation files as inputs.
There are a couple of quick-and-dirty (i.e. hard-coded) fixes for this, but I'd rather implement a more thorough fix. Since this is something that predates this PR and is not directly related, I'd like to create an issue and take a look next week. Are you ok with that? Otherwise I can do a one-liner quick fix.
@gsketefian great! I suggest you do 'dirty' quick fix now (although not related to this PR), and open an issue for nice fix for later. @MichaelLueken what do you think? I'll approve this PR either way.
@gsketefian great! I suggest you do 'dirty' quick fix now (although not related to this PR), and open an issue for nice fix for later. @MichaelLueken what do you think? I'll approve this PR either way.
@RatkoVasic-NOAA I'd recommend opening an issue to properly address this issue rather than create a temporary fix. A quick-and-dirty fix could potentially break the way the workflow runs for certain tests, leading to more issues. I'd err on the side of caution and focus on a fix that will work for all verification tests.
@gsketefian Once the parm/metplus/STATAnalysisConfig_skill_score
file has been added back and @EdwardSnyder-NOAA has approved, I will kick of the automated Jenkins tests.
@MichaelLueken @RatkoVasic-NOAA @EdwardSnyder-NOAA I added back the file STATAnalysisConfig_skill_score
and created an issue at #1084. I think this PR is ready to (re)test. Thanks.
Except expected failures which are reported in issue #1084, tests passed on Hera:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
MET_ensemble_verification_20240510195653 COMPLETE 24.54
MET_ensemble_verification_only_vx_20240510195655 COMPLETE 1.04
MET_ensemble_verification_only_vx_time_lag_20240510195657 DEAD 3.49
MET_ensemble_verification_winter_wx_20240510195700 COMPLETE 122.83
MET_verification_20240510195702 COMPLETE 10.39
MET_verification_only_vx_20240510195703 COMPLETE 0.23
MET_verification_winter_wx_20240510195705 COMPLETE 17.69
----------------------------------------------------------------------------------------------------
Total DEAD 180.21
MET_ensemble_verification_only_vx_time_lag/log/run_MET_GenEnsProd_vx_REFC_2021050500.log.0
MET_ensemble_verification_only_vx_time_lag/log/run_MET_GenEnsProd_vx_RETOP_2021050500.log.0
Approved.
The Jenkins tests successfully passed on all platforms. Moving forward with merging this work now.
DESCRIPTION OF CHANGES:
This PR removes the tasks
parse_vx_config_[det|ens]
that read in and parse the verification (vx) yaml configuration files (one for deterministic vx and another for ensemble vx). Currently, these tasks are used to call the scriptdecouple_fcst_obs_vx_config.py
to separate the "coupled" vx config information in the yaml files into forecast and observation components. For example, a "coupled" field name of "CRAIN%%PRWE" in the deterministic configuration file corresponds to a field name of "CRAIN" for forecasts and "PRWE" for the observations (and the "%%" is the delimiter). After separating each item (field name, level, or threshold) in this way, theparse_vx_config_[det|ens]
tasks write this separated information into two new intermediate configuration files (one for deterministic and one for ensemble) as two separate dictionaries (one for forecasts and another for observations). These intermediate config files are then read in by the various vx task ex-scripts.In this PR, the
parse_vx_config_[det|ens]
tasks and thedecouple_fcst_obs_vx_config.py
script are removed (so that the intermediate configuration files are no longer created). The separation into forecast and observation values of the "coupled" information in the vx configuration files is now performed in the jinja2 templates for the METplus configuration files, hiding these details from the user.This PR was prompted by some of the questions by @mkavulich in PR #1005.
Type of change
TESTS CONDUCTED:
The following vx WE2E tests were run on Hera:
They all completed successfully.
DOCUMENTATION:
No updates necessary.
ISSUE:
None.
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@mkavulich