ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
54 stars 116 forks source link

Updates to the authoritative develop branch since the SRW v2.2 release (10/31/2023) #981

Open MichaelLueken opened 8 months ago

MichaelLueken commented 8 months ago

PR #938 - Build conda and environments in SRW. Merged on 11/29/2023: Modifies devbuild.sh to add the option to install miniforge (a version of miniconda that manages channels more strictly) in a user's specified location and defaults to inside the user's clone. It also installs two environments needed for SRW -- srw_app, which is similar to the old workflow_tools environment, and srw_graphics, which is sufficient to support the plotting scripts in SRW. If the SRW builds the AQM, then a third environment, srw_aqm, is also installed to support AQM.

MichaelLueken commented 4 months ago

PR #924 - Add job cards for wrappers for individual machines. Merged on 12/05/2023: Added job cards for individual tasks, similar to wrapper scripts, but tailored for each job scheduler used on NOAA HPCs (PBS Pro and Slurm).

MichaelLueken commented 4 months ago

PR #977 - Fixing bug: moved placing fix_lam tests' directories from common place (ufs-srweather-app) to each tests' run directory. Merged on 12/14/2023:

MichaelLueken commented 4 months ago

PR #994 - Integrate UW CLI tool for templater and remove external dependency. Merged on 01/11/2024: The workflow-tools package was initially integrated with SRW as an external repository under ush/python_utils. Since then, we have packaged the code as a conda package and it is now installed automatically on most platforms (WCOSS excluded, but with workarounds in place).

The prior integration is removed in this update, while leaning on the UW command line tools available from the conda package. For now, this involves calling the command line tools in a subprocess from Python code. The UW team have an API under development that will replace this in the near future, so this will not likely be the final result for the Python-based scripts you see here.

MichaelLueken commented 4 months ago

PR #997 - Fixing several issues, including 966 (bash octal issue); add new winter weather verification test with staged data. Merged on 01/11/2024: New test

Resolved issues

Other fixes

General improvements

MichaelLueken commented 4 months ago

PR #963 - Merge relevant release documentation updates into develop. Merged on 01/12/2024: A variety of updates to the release v2.2.0 documentation are relevant to the develop branch and are being incorporated via this PR.

MichaelLueken commented 4 months ago

PR #973 - Verification upgrades and bug fixes. Merged on 01/16/2024: This update cleans up and simplifies the verification tasks in the SRW App. Main changes:

MichaelLueken commented 4 months ago

PR #1012 - Add -n 1 to allow the use of the service partition. Merged on 02/09/2024 Following the Slurm update on Hera and Jet, the service partition is no longer usable within the SRW App. The necessary changes to allow the service partition to once again function properly have been made, by adding -n 1 to the SCHED_NATIVE_CMD_HPSS variable in the Hera and Jet machine yaml files, and updating the native entry in the parm/wflow/verify_pre.yaml and parm/wflow/aqm_prep.yaml files.

MichaelLueken commented 4 months ago

PR #1014 - Quarterly Documentation Update (PI11). Merged on 02/15/2024: Updates include:

MichaelLueken commented 4 months ago

PR #969 - Update SRW with spack-stack version 1.5.0 (from 1.4.1). Merged on 02/15/2024:

MichaelLueken commented 4 months ago

PR #917 - Enable UPP 2d decomposition. Merged on 02/21/2024: Changes to enable 2d decomposition include:

The ufs-weather-model (020e783), UPP (fae617b), and UFS_UTILS (dc0e4a6) hashes have been updated in this work.

MichaelLueken commented 4 months ago

PR #1018 - Update doc requirements and add logo. Merged on 02/21/2024:

MichaelLueken commented 4 months ago

PR #1041 - Changes for Rocky8 on Hera. Merged on 02/26/2024: Hera is switching from CentOS to Rocky OS.

MichaelLueken commented 4 months ago

PR #1043 - Add three UFS Case Studies to WE2E testing process. Merged on 02/28/2024: Adding three additional UFS Case Studies (2019 Hurricane Barry, 2019 Halloween Storm, and 2020 July CAPE) to the workflow end-to-end testing process.

MichaelLueken commented 4 months ago

PR #1047 - Update for Gaea-c5. Merged on 02/29/2024:

A solution to solve library conflict for libstdc++.so.6 was to preload a specific library during a runtime, as specified in ./modulefiles/wflow_gaea.lua , ./modulefiles/tasks/gaea/python_srw.lua:

setenv("LD_PRELOAD", "/opt/cray/pe/gcc/12.2.0/snos/lib64/libstdc++.so.6")

MichaelLueken commented 4 months ago

PR #1046 - Add Contributor's Guide to documentation. Merged on 03/01/2024: This PR adds a Contributor's Guide to the docs alongside the User's Guide.

MichaelLueken commented 4 months ago

PR #1040 - Fix sample script and WE2E test for AQM. Merged on 03/05/2024:

MichaelLueken commented 4 months ago

PR #1042 - Add integration test job. Merged on 03/08/2024: This update adds a test job to the workflow. It was originally written with pytest but because of some file naming issues, the python package unittest was used instead. The test checks for the existence of netcdf files from the weather model.

The necessary scripts were added or modified to incorporate the integration job into the workflow. A wrapper script was also added.

MichaelLueken commented 4 months ago

PR #1045 - Jet switch from CentOS to Rocky. Merged on 03/13/2024: Jet has migrated from CentOS to Rocky8 following the system maintenance on 03/12/2024.

This work sets the updated Rocky8 spack-stack as default in the build_jet_intel.lua modulefile and modifies the Jet machine file to use PARTITION_FCST: xjet.

MichaelLueken commented 4 months ago

PR #1048 - Expand forecast fields for metric test. Merged on 03/14/2024: This PR expands the number of forecast fields for the Skill Score metric test. The forecast length in the metric WE2E test was extended to 12 hours so that the RMSE metric can be calculated for these additional forecast fields:

Adding these additional forecast fields will make the skill score metric test more thorough and thus making it a more inclusive test to compare against.

Also, a change was made to the .cicd/scripts/srw_metric_example.sh script to reflect the new conda environment.

MichaelLueken commented 4 months ago

PR #1055 - Update GFS v17 p8 suite to address cold bias. Merged on 03/15/2024: A SRW App user noticed https://github.com/ufs-community/ufs-srweather-app/issues/1004 with the FV3_GFS_v17_p8 physics suite, that the surface temperatures were dropping unrealistically throughout the forecast. This PR addresses that issue by updating the FV3_GFS_v17_p8 physics suite in the parm/FV3.input.yml file.

This issue was discovered in the SRW App v2.2.0, but since the FV3_GFS_v17_p8 physics suite is not officially supported for the release, the change will only go into in the develop branch.

MichaelLueken commented 4 months ago

PR #1054 - Use uwtools instead of set_namelist. Merged on 03/20/2024: Continues the integration of the uwtools package. In this PR, I've done the following:

MichaelLueken commented 4 months ago

PR #1060 - Update AQM task scripts with those of production/aqm_dev branch. Merged on 03/27/2024:

MichaelLueken commented 4 months ago

PR #1050 - Update weather model, UPP, and UFS_UTILS hashes. Merged on 03/27/2024: Updating the ufs-weather-model hash to 8518c2c (March 1), the UPP hash to 945cb2c (January 23), and the UFS_UTILS hash to 57bd832 (February 6).

This work also required several modifications to allow the updated weather model and UFS_UTILS hashes to work in the SRW:

MichaelLueken commented 4 months ago

PR #1065 - Fix failure on warm start option of SRW-AQM. Merged on 04/04/2024:

MichaelLueken commented 3 months ago

PR #1067 - Port SRW-AQM to Orion and Hercules. Merged on 04/08/2024:

MichaelLueken commented 3 months ago

PR #1058 - Feature/cicd metrics adds methods to collect resource usage data from major stages of the SRW pipeline build job. Merged on 04/15/2024:

Updated SRW Jenkinsfile with some run-time stats collection, and adds a final stage that triggers ufs-srw-metrics stats collection job for reporting metrics.

The SRW pipeline job that uses this Jenkinsfile will now use the 'time' command when executing major stages: init, build, test. This will collect CPU, Memory, and DiskUsage measurements that can be later used in trend plots on a metrics dashboard.

Additionally, it adds options to the pipeline job that allow the operator to select just a single test, or no test suite (default is still 'coverage' suite), and allows an option to select the depth of wrapper script tasks to execute during functional testing (default is still all 9 scripts).

MichaelLueken commented 3 months ago

PR #1068 - Update weather model hash and correct behavior in Functional WorkflowTaskTests Jenkins stage. Merged on 04/15/2024:

MichaelLueken commented 3 months ago

PR #1077 - Update nco version. Merged on 04/23/2024:

Hera with Intel compiler was using system installed nco library (4.9.3 version). It was not noticed until sys admins removed read permissions to 4.9.3 version and installed new version (5.1.6).

Will use spack-stack installed nco (version 5.0.6), like all other machines/compilers.

MichaelLueken commented 3 months ago

PR #1079 - Feature cicd scorecard metrics. Merged on 04/25/2024:

MichaelLueken commented 3 months ago

PR #1078 - Replace existing UW CLI with UW API calls to template. Merged on 04/26/2024:

This work continues the integration of the uwtools package by replacing current use of the UW CLI with UW API calls in Python scripts. These changes are limited to the UW template tool.

MichaelLueken commented 3 months ago

PR #1074 - Update weather model hash and remove "_vrfy" from bash commands. Merged on 04/30/2024:

The weather model hash has been updated to 4f32a4b (April 15).

Additionally, _vrfy has been removed from the cd, cp, ln, mkdir, mv, and rm bash commands in jobs, scripts, ush, and ush/bash_utils. The modified commands don't function as intended (issue https://github.com/ufs-community/ufs-srweather-app/issues/861) and aren't accepted by NCO (issue https://github.com/ufs-community/ufs-srweather-app/issues/1021).

MichaelLueken commented 3 months ago

PR #1005 - Streamline SRW App's interface to MET/METplus. Merged on 05/01/2024:

This PR streamlines the SRW App's interface to the MET/METplus verification tool and implements some bug fixes. Details:

MichaelLueken commented 2 months ago

PR #1082 - Simplify the way the configuration of the vx is handled. Merged on 05/13/2024:

The parse_vxconfig[det|ens] tasks and the decouple_fcst_obs_vx_config.py script are removed (so that the intermediate configuration files are no longer created). The separation into forecast and observation values of the "coupled" information in the vx configuration files is now performed in the jinja2 templates for the METplus configuration files, hiding these details from the user.

MichaelLueken commented 2 months ago

PR #1081 - Add the remaining UFS Case Studies. Merged on 05/15/2024:

Add the remaining UFS Case Studies to the SRW App as WE2E tests. These new tests were added to the comprehensive and coverage files as well.

MichaelLueken commented 2 months ago

PR #1083 - Update WM and UPP hashes. Merged on 05/15/2024:

MichaelLueken commented 2 months ago

PR #1087 - Fix CI scripts to save logfile names that Jenkinsfile needs for pwcloud platform builds. Merged on 05/31/2024:

Make sure the log file names match what Jenkinsfile needs, specifically for PW cloud platforms - Azure, AWS, GCP

MichaelLueken commented 2 months ago

PR #1086 - Update UFS-WM and UPP hashes. Merged on 06/05/2024:

MichaelLueken commented 1 month ago

PR #1090 - Port SRW-AQM to Derecho. Merged on 06/07/2024:

MichaelLueken commented 1 month ago

PR #1093 - Upgrade SRW to spack-stack 1.6.0 from 1.5.1. Merged on 06/21/2024:

Since the ufs-weather-model was upgraded to spack-stack 1.6.0, the SRW App has been upgraded as well.

MichaelLueken commented 1 month ago

PR #1095 - Updated ConfigWorkflow.rst to reflect changes to config_defaults.yaml (PI12). Merged on 06/21/2024:

Updated ConfigWorkflow.rst to reflect recent changes to config_defaults.yaml in order to keep documentation up to date.

MichaelLueken commented 3 weeks ago

PR #1102 - Bug fix to support the %H format in METplus via printf. Merged on 07/12/2024:

This bug was encountered when verifying forecast output that has a 2-digit forecast hour in its name. It turns out specifying the METplus format %H to obtain a 2-digit forecast hour in the workflow/verification configuration variable FCST_FN_TEMPLATE (and others) causes an error in the shell script eval_METplus_timestr_tmpl.sh because bash's printf utility does not support the %H format. This fixes that error using a similar approach to the %HHH format for obtaining 3-digit hours.

MichaelLueken commented 2 weeks ago

PR #1103 - Update requests and certifi in requirements.txt. Merged on 07/15/2024:

MichaelLueken commented 1 week ago

PR #1098 - Transition the var_defns bash file to YAML. Merged on 07/26/2024:

Use YAML for the configuration language at run time.

MichaelLueken commented 5 days ago

PR #1091 - Fixes for PW Jenkins Nightly Builds. Merged on 07/30/2024:

MichaelLueken commented 3 days ago

PR #1104 - S3 doc updates. Merged on 08/01/2024:

As part of the data governance initiative, all s3 buckets will need some sort of versioning control. To meet these needs the AWS S3 Bucket was reorganized with the develop data stored under a 'develop-date' folder and the verification sample case and the document case (current_release_data) moved under a new folder called 'experiment-user-cases'.