ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
55 stars 116 forks source link

[aqm_dev] Launch Workflow Fails Since Slurm Updates on Hera #1010

Closed drnimbusrain closed 6 months ago

drnimbusrain commented 6 months ago

Expected behavior

Launching the ufs-srw-app [aqm_dev] workflow on Hera worked fine until the system update to Slurm and/or Rocoto module updates.

Current behavior

Instead, since the updates, there is a module files issue creating an error when launching workflow.

./launch_FV3LAM_wflow.sh

Modules based on Lua: Version 8.5.2 2021-05-12 12:44 -05:00 by Robert McLay mclay@tacc.utexas.edu

Currently Loaded Modules: 1) rocoto/1.3.6 2) miniconda3/4.12.0 3) wflow_hera

Traceback (most recent call last): File "/scratch2/NAGAPE/arl/Patrick.C.Campbell/run_ufs-srw-app_online-cmaq/run_na_test_ufs-srw-app-ufs-comm-aqm_dev_20240205/ush/get_crontab_contents.py", line 10, in from python_utils import ( File "/scratch2/NAGAPE/arl/Patrick.C.Campbell/run_ufs-srw-app_online-cmaq/run_na_test_ufs-srw-app-ufs-comm-aqm_dev_20240205/ush/python_utils/init.py", line 31, in from .config_parser import ( File "/scratch2/NAGAPE/arl/Patrick.C.Campbell/run_ufs-srw-app_online-cmaq/run_na_test_ufs-srw-app-ufs-comm-aqm_dev_20240205/ush/python_utils/config_parser.py", line 39, in import jinja2 ModuleNotFoundError: No module named 'jinja2'

The end-to-end run of the workflow for the forecast experiment specified by expt_name has completed with the following workflow status (wflow_status): expt_name = "aqm_retro_aqmna13km" wflow_status = "FAILURE"

Machines affected

Hera

Steps To Reproduce

git clone -b aqm_dev https://github.com/ufs-community/ufs-srweather-app.git
   cd ufs-srweather-app/sorc

./app_build.sh -p=hera -a=ATMAQ --extrn

cd ../ush
   cp config.aqm.retro.hera.yaml config.yaml

module purge
   module use ../modulefiles
   module load wflow_hera
   conda activate regional_workflow

./generate_FV3LAM_wflow.py

cd ../../expt_dirs/aqm_retro_aqmna13km
   ./launch_FV3LAM_wflow.sh

Detailed Description of Fix (optional)

There needs to be an update to the module files for the ufs-srw-app [aqm_dev] branch (and maybe others) so that a workflow can be successfully launched on Hera. It would be good to know who is the POC to get this fixed for the [aqm_dev] branch and potentially other branches as well.

Additional Information (optional)

Possible Implementation (optional)

Output (optional)