ufs-community / regional_workflow

THIS REPOSITORY IS NOW DEPRECATED; SEE UFS SRW APP FOR CURRENT CODE
https://github.com/ufs-community/ufs-srweather-app
Other
10 stars 86 forks source link

Standardize static data across Tier-1 platforms; fix and improve IC and LBC data retrieval #744

Closed mkavulich closed 2 years ago

mkavulich commented 2 years ago

DESCRIPTION OF CHANGES:

This PR implements the new static input data directory structures as described in UFS SRW App Issue 231.

In addition, it fixes some issues with retrieving staged model input data from disk, and gives a way to specify whether or not to check alternate sources of data if not found on disk (HPSS, AWS). These flags have different defaults for different platforms based on data availability (for example, Cheyenne does not have NOAA HPSS access, and has no internet access on compute nodes, so both alternate sources have been turned off by default), and for WE2E tests that expect data staged on disk, these flags have been turned off.

Finally, some quality-of-life changes have been implemented, such as typo fixes, and reducing the number of tries for retrieving data from NOMADS from 20 failures (!) to 2.

TESTS CONDUCTED:

All tests run to completion on Hera (aside from known failures as documented in #728, #729, and #731).

Tests on Cheyenne, Orion, and Jet are pending.

DEPENDENCIES:

There are no PR dependencies for this PR, but data will need to be staged on disk on each platform before tests will run successfully. So far, data has been staged on Hera and Cheyenne, and should be staged

I will need assistance in getting data to other platforms from others who have access to those platforms.

DOCUMENTATION:

Documentation for these changes and the new directory structure can be found in the ufs-srweather-app issue #231

I was not sure what the best place to include this for version control is, but I wrote up a README file for the new "input_model_data" directory, attached here: README_input_model_data.txt

ISSUE (optional):

CONTRIBUTORS (optional):

@christinaholtNOAA

chan-hoo commented 2 years ago

@mkavulich, COMINgfs (renamed to COMIN in PR #743) should not be removed from the regional workflow. NCO uses "COMIN" for defining paths to the input data.

mkavulich commented 2 years ago

@chan-hoo Thank you for your comment. Is there a reason to have a separate path to input data for nco mode? Why can't we just use the same variable for input data regardless of run mode? Is this simply a "requirement" for NCO standards to have this variable named in this particular way?

chan-hoo commented 2 years ago

@mkavulich, I agree with you, but NCO would not. @BenjaminBlake-NOAA has much experience in this. Ben, can you answer this? According to the NCO standards (pp. 4, Table 1), the standard environment variable name is "COMIN" for the com directory for current model's input data, typically $COMROOT/$NET/$model_ver/$RUN.$PDY.

BenjaminBlake-NOAA commented 2 years ago

@mkavulich You are correct, it is a requirement set by the NCO standards to define the location of input data as COMIN. I agree that EXTRN_MDL_SOURCE_BASEDIR makes more sense, but we should leave it as COMIN for the NCO mode. Thanks!

JacobCarley-NOAA commented 2 years ago

@chan-hoo and @BenjaminBlake-NOAA are correct. See Table 1 in Section III of the Implementation Standards: https://www.nco.ncep.noaa.gov/idsb/implementation_standards/ImplementationStandards.v11.0.0.pdf

mkavulich commented 2 years ago

Data has now been staged in the new, standardized directory structure on the following platforms:

I am running a full suite of tests on Jet currently since I did not get to that next week, but barring any surprises there I believe this PR is ready for review and merge once approved.

@chan-hoo I believe I have restored all the "COMGFS" instances, let me know if anything looks amiss on that front.

venitahagerty commented 2 years ago

Machine: jet Compiler: intel Job: WE Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/914443482/20220425232013/ufs-srweather-app Build was Successful Rocoto jobs started Long term tracking will be done on 10 experiments If test failed, please make changes and add the following label back: ci-jet-intel-WE Experiment Failed on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha 2022-04-26 00:28:07 +0000 :: fe5 :: Task run_fcst, jobid=2644958, in state DEAD (FAILED), ran for 118.0 seconds, exit status=256, try=1 (of 1) Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2 Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta

chan-hoo commented 2 years ago

Regarding the python argument 'capture_output' @christinaholtNOAA removed in the above commit, the latest version of python available on the wcoss dell is 3.6.3. In this version, 'capture_output' is not supported. Thank you @christinaholtNOAA for this quick fix!

mkavulich commented 2 years ago

Thanks everyone for the reviews. For the future record, on the final hash (b4a7a1), the following tests were run successfully:

Orion

The following tests failed:

Cheyenne (intel)

Successful tests:

The following tests failed:

Will add further information as more tests complete.