ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
55 stars 116 forks source link

[develop] The service partition no longer works on Hera and Jet following the Slurm updates #1011

Closed MichaelLueken closed 6 months ago

MichaelLueken commented 6 months ago

Expected behavior

Running the get_extrn_ics and get_extrn_lbcs (as well as the tasks responsible for retrieving verification observations) should successfully pass on all platforms.

Current behavior

These tasks are failing, following the updates to Slurm on both platforms.

Machines affected

Currently, only Hera and Jet

Steps To Reproduce

  1. Clone and build the SRW app on either machine.
  2. Run any test that utilizes the service partition.
  3. The job will fail to launch the get_* tasks.

Detailed Description of Fix (optional)

In order to correct this behavior, the SCHED_NATIVE_CMD_HPSS entry in the workflow needs to be updated to include either -n 1 or --ntasks 1. This will correct the issue for get_extrn_ics and get_extrn_lbcs. For the verification observation tasks, updated native entries need to be applied to parm/wflow/verify_pre.yaml's task_get_obs_ccpa, task_get_obs_nohrsc, task_get_obs_mrms, and task_get_obs_ndas.