mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
169 stars 51 forks source link

LSF support broken due to inappropriate shQuote #268

Closed DarwinAwardWinner closed 3 years ago

DarwinAwardWinner commented 3 years ago

In the LSF submitJob function, shQuote is called on the file name passed as stdin to runOSCommand, which passes this argument to system2, which calls shQuote on it again. This means that if the file to be read is /tmp/Rtmp75UeJV/registry333bb5537ef2b/jobs/job2cf9f8b849db39df8ac03cc74a175e95.job, it ends up running something like sh <"'/tmp/Rtmp75UeJV/registry333bb5537ef2b/jobs/job2cf9f8b849db39df8ac03cc74a175e95.job'" which results in an error like:

Error: Fatal error occurred: 101. Command 'bsub' produced exit code 1. Output: 'sh: '/tmp/Rtmp75UeJV/registry333bb5537ef2b/jobs/job2cf9f8b849db39df8ac03cc74a175e95.job': No such file or directory'

The fix is to simply delete the shQuote call from this line:

https://github.com/mllg/batchtools/blob/9620f67173e401256e07789d796ca2127b4934a4/R/clusterFunctionsLSF.R#L41

DarwinAwardWinner commented 3 years ago

For anyone who just needs this to work now without waiting for a package update, you can add the following code to your batchtools config to fix it in situ:

# This should just be whatever arguments you usually use
cluster.functions  = makeClusterFunctionsLSF(
    template = "/hpc/users/thompr21/.batchtools.lsf.tmpl"
)

# This is the part you need to add
local({
    fix_cf <- function(cf) {
        orig_submitJob <- cf$submitJob
        new_submitJob <- function (reg, jc)
        {
            assertRegistry(reg, writeable = TRUE)
            assertClass(jc, "JobCollection")
            outfile = cfBrewTemplate(reg, template, jc)
            ## No shQuote here
            res = runOSCommand("bsub", stdin = outfile)
            if (res$exit.code > 0L) {
                cfHandleUnknownSubmitError("bsub", res$exit.code, res$output)
            }
            else {
                batch.id = stri_extract_first_regex(stri_flatten(res$output,
                                                                 " "), "\\d+")
                makeSubmitJobResult(status = 0L, batch.id = batch.id)
            }
        }
        environment(new_submitJob) <- environment(orig_submitJob)
        cf$submitJob <- new_submitJob
        cf
    }
    cluster.functions <<- fix_cf(cluster.functions)
})
mllg commented 3 years ago

Thanks for the detailed report. Getting the quoting right is difficult, both in general and in particular because R-4.0.0 changed a lot internally regarding shQuote.

For the reported issue, I'll have to trust you here, I don't have access to a LSF system anymore.

mllg commented 3 years ago

Fixed now in master, let me know if you encounter more issues.