mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
172 stars 51 forks source link

Old version of Slurm, "invalid job state specified: SI" #300

Open jacob-long opened 8 months ago

jacob-long commented 8 months ago

Wanted to report this issue that I have "solved." Was using batchtools via future.batchtools but got this cryptic (to me) error when I changed employers and thus moved from a PBS system to a Slurm system.

Error: Failed to submit BatchtoolsSlurmFuture (<none>). The reason was: Listing of jobs failed (exit code 1);
cmd: 'squeue --user=$USER --states=R,S,CG,RS,SI,SO,ST --noheader --format=%i -r'
output:
squeue: error: Invalid job state specified: SI
squeue: error: Valid job states include: PENDING,RUNNING,SUSPENDED,COMPLETED,CANCELLED,FAILED,TIMEOUT,NODE_FAIL,PREEMPTED,BOOT_FAIL,DEADLINE,COMPLETING,CONFIGURING,RESIZING,SPECIAL_EXIT

I initially thought the error message was spurious since I can see online that SI is a valid job state. After a lot of poking and prodding, it occurred to me that maybe different versions of Slurm accept different job states arguments. Indeed this was the issue. My employer is running Slurm 16.0.5, which is nearly 7 years old and in that time period, the SI state was added.

Maybe I don't understand batchtools functionality well enough, but it seemed I couldn't override the squeue --user=$USER --states=R,S,CG,RS,SI,SO,ST --noheader --format=%i -r called by makeClusterFunctionsSlurm() (https://github.com/mllg/batchtools/blob/1196047ed5115d54bde2923848c1f3ec11fda6d2/R/clusterFunctionsSlurm.R#L106).

My solution was to install the most recent version of batchtools that did not use "SI" in that argument as well as an older version of future.batchtools that didn't force an update to an incompatible version of batchtools.

I suspect a pretty strong chance that there's a better way for the end user to deal with this than what I came up with, but thought the package dev(s) should be aware that this is a possible issue. Worst case, the next person searching the internet for help with this problem might find this :)

plaffon1 commented 3 months ago

Same issue here. Thanks !