Use of the default makeClusterFunctionsSlurm function would map all job state codes returned by squeue to reasonable defaults for general purpose.
Problem
Unmapped Slurm job state codes in makeClusterFunctionsSlurm were resulting in an NA status returned by getStatusTable, triggering errors downstream or leaving running jobs orphaned by batchtools.
Job is awaiting reources, the infrastructure is being configured/booted, or the job has been requeued.
PD,CF,RF,RH,RQ,SE
Running
Job is running, suspended, completing or otherwise retaining CPU resources, including resizing, being signalled, staging outfiles or in the 'stopped' state.
R,S,CG,RS,SI,SO,ST
Expired
RD (RESV_DEL_HOLD) was initially mapped to queued, but querying squeue by status=RD throws an error on slurm v20.11.4, so left unhandled to result in an expired status.
Job is not anticipated to require resources in the future, including failure of infrastructure, exit code, cancellation, completion, out of memory, preemption, & timeout.
BF,CA,CD,DL,F,NF,OOM,PR,RV,TO,RD
Custom Mapping
This commit will solve the majority of errors caused by running squeue at the wrong moment, when an unmapped job state code for a running job would trigger batchtools to report an incorrect expired status.
This commit will not solve all infrastructure-specific issues, for instance where Slurm requeues jobs after preemptionworkaround. Users that need finer control over mapping could try the default makeClusterFunctions
Expected Behaviour
makeClusterFunctionsSlurm
function would map all job state codes returned bysqueue
to reasonable defaults for general purpose.Problem
makeClusterFunctionsSlurm
were resulting in anNA
status returned bygetStatusTable
, triggering errors downstream or leaving running jobs orphaned bybatchtools
.Mapping Strategy
Queued
Running
Expired
Custom Mapping
squeue
at the wrong moment, when an unmapped job state code for a running job would trigger batchtools to report an incorrectexpired
status.for instance where Slurm requeues jobs after preemptionworkaround. Users that need finer control over mapping could try the defaultmakeClusterFunctions