ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
56 stars 119 forks source link

[develop] Enable UPP 2d decomposition #917

Closed MichaelLueken closed 8 months ago

MichaelLueken commented 1 year ago

DESCRIPTION OF CHANGES:

With recent changes made to the UFS-WM (PR #1823), it is now possible to see in the log files that UPP 2d decomposition is being performed for inline post. To this end, the UFS-WM hash has been updated to 020e783 (October 27, 2023), the UPP hash was updated to fae617b (October 6, 2023), and the UFS_UTILS hash was updated to dc0e4a6 (November 6, 2023).

With the updated UPP hash, the old postxconfig-NT-fv3lam.txt file had been removed. Wen suggested using the postxconfig-NT-fv3lam_rrfs.txt file instead. Unfortunately, due to issues with the CRTM on Derecho, we are unable to move forward with this postxconfig file (this file contains assimilated radiances). The ufs-weather-model/tests/parm directory contains a copy of the old postxconfig-NT-fv3lam.txt file. To expedite moving forward with this update, I'm currently pointing to this copy of the postxconfig file. Once the issue with the CRTM on Derecho has been corrected, the SRW App repository can transition to the postxconfig-NT-fv3lam_rrfs.txt file.

Changes to enable 2d decomposition include:

The ability to run comprehensive tests has been added back into the Jenkinsfile, as well as the ability to run the automated Jenkins tests on Derecho.

Type of change

TESTS CONDUCTED:

DEPENDENCIES:

None

DOCUMENTATION:

Documentation has been updated in the WE2E test configuration files, the ConfigWorkflow.rst chapter, and config_defaults.yaml.

ISSUE:

Resolves #480

CHECKLIST

MichaelLueken commented 11 months ago

@gspetro-NOAA - Thank you very much for reviewing these changes! I have committed the suggestions that you have made to the documentation, replaced the lowercase d with uppercase D for ITASKS in config_defaults.yaml for consistency, and added logging prints to run_WE2E_tests.py to let users know that the values are being reset for GNU compilers. I'm hoping I'll be able to satisfactorily go through this morning's code review! Please let me know if you see any other changes that should be made or if you have any questions.

gspetro-NOAA commented 11 months ago

It looks like we need to add the line:

DOMAIN_PREGEN_BASEDIR: /contrib/EPIC/UFS_SRW_data/develop/FV3LAM_pregen

to the noaacloud machine file for the fundamental tests to run on the cloud. @EdwardSnyder-NOAA said that we did this for the release in PR #937 and should do that in develop, too. Without it, experiment generation failed. After I added that line, fundamental tests generally pass.

MichaelLueken commented 11 months ago

Thank you very much for running the fundamental tests on NOAA Cloud platforms, @gspetro-NOAA! I have added the DOMAIN_PREGEN_BASEDIR entry into ush/machine/noaacloud.yaml at ec44d97.

MichaelLueken commented 8 months ago

On Jet, the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 WE2E coverage test had failed. Rocotorewind/rocotoboot allowed the test to successfully pass:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240216161033                                           COMPLETE              20.37
custom_ESGgrid_20240216161035                                      COMPLETE              17.82
custom_ESGgrid_Great_Lakes_snow_8km_20240216161036                 COMPLETE              58.08
custom_GFDLgrid_20240216161038                                     COMPLETE               9.41
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202402  COMPLETE              10.73
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20  COMPLETE              55.55
get_from_HPSS_ics_RAP_lbcs_RAP_20240216161043                      COMPLETE              17.62
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240216161044  COMPLETE             338.41
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              45.93
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.29
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024  COMPLETE             523.51
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR_2024  COMPLETE              10.59
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1116.31

This also shows that the code in this branch is ready to run on Rocky8, since the xjet partition is required to run the forecast job. One caveat is that the Jenkins pipeline for Jet will need to be set to one of the login nodes that has been transitioned to Rocky8, or else the run_fcst job will never run (you need to be logged into a Rocky8 login node in order to use the xjet partition).

Once a successful run of the nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16 WE2E test completes on Derecho, this work can get merged. Currently encountering issues with the get_extrn_ics and get_extrn_lbcs tasks failing to retrieve the necessary data from AWS in 3 hours (this has only been encountered beginning after the Presidents' Day weekend). If this task continues to fail, then the memory for these tasks will need to be raised from 2G to a higher value.

MichaelLueken commented 8 months ago

Manually submitting the Jenkins coverage WE2E tests on Derecho successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km_20240221070627                      COMPLETE              30.98
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              49.38
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024022107063  COMPLETE              54.23
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR_20240221  COMPLETE              36.72
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              22.19
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024022107064  COMPLETE              48.32
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_  COMPLETE              30.16
pregen_grid_orog_sfc_climo_20240221070649                          COMPLETE              23.06
specify_template_filenames_20240221070651                          COMPLETE              23.59
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             318.63

Moving forward with merging this work now.