mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
169 stars 51 forks source link

All jobs expired with SGE cluster, job script permission denied #282

Closed ImNotaGit closed 2 years ago

ImNotaGit commented 2 years ago

I am trying to set up batchtools with SGE, and tested using the "approximation of pi" example in the tutorial. I followed all lines of code in the tutorial section exactly, except for a customized submitJobs() which looks like submitJobs(resources=list(project="short", memory=1)). When checking job status with getStatus() immediately after job submission, I saw that all jobs have expired. When checking the temp dir for the log file (e.g. <tmp_dir>/registry298c1c9cc70b/logs/job2099fc469453267ec8b7e988acc17a58.log), I see the error message:

/bin/bash: <tmp_dir>/registry298c1c9cc70b/jobs/job2099fc469453267ec8b7e988acc17a58.job: Permission denied

The corresponding .job script exists and looks fine, its file permission is -rw-rw-rw-.

My ~/.batchtools.conf.R looks like this:

cluster.functions = makeClusterFunctionsSGE(template = "~/.batchtools.sge.tmpl")

My ~/.batchtools.sge.tmpl looks like this:

#!/bin/bash

#$ -cwd -V -b y

## Job name
#$ -N <%= if (exists("job.name", mode = "character")) job.name else job.hash %>

## Combining output/error messages into one file
#$ -j y
## Log file
#$ -o <%= log.file %>

## Project (short, normal, long, etc.)
#$ -P <%= resources$project %>

## Memory
#$ -l mem_reserve=<%= paste0(resources$memory, "G") %>
#$ -l mem_free=<%= paste0(resources$memory, "G") %>
#$ -l h_vmem=<%= paste0(resources$memory, "G") %>
#$ -l virtual_free=<%= paste0(resources$memory, "G") %>

## Launch R and evaluated the batchtools R job
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
exit 0

Any idea how to solve this issue? Thanks in advance.

sessionInfo():

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: <something>/centos76_x86_64/lib/libopenblas_nehalemp-r0.3.9.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] batchtools_0.9.16

loaded via a namespace (and not attached):
 [1] fansi_0.5.0         prettyunits_1.1.1   utf8_1.2.2
 [4] withr_2.4.2         digest_0.6.28       crayon_1.4.1
 [7] rappdirs_0.3.3      R6_2.5.1            lifecycle_1.0.0
[10] backports_1.2.1     magrittr_2.0.1      pillar_1.6.2
[13] debugme_1.1.0       rlang_0.4.11        progress_1.2.2
[16] stringi_1.7.4       fs_1.5.0            data.table_1.14.0
[19] vctrs_0.3.8         brew_1.0-6          checkmate_2.0.0
[22] ellipsis_0.3.2      tools_4.0.2         hms_1.1.0
[25] compiler_4.0.2      pkgconfig_2.0.3     BiocManager_1.30.16
[28] base64url_1.4       tibble_3.1.4
ImNotaGit commented 2 years ago

OK, so it seems that it's literally a file permission issue, as the job script does not have execution permission, and thus on my system it fails to run. Obviously this didn't happen for other people, this I don't know why. But to solve my issue, I dig into the source code a bit and fixed this by adding a Sys.chmod line to the cfBrewTemplate function in clusterFunctions.R:

cfBrewTemplate = function(reg, text, jc) {
  assertString(text)
  outfile = fs::path(dir(reg, "jobs"), sprintf("%s.job", jc$job.hash))

  parent.env(jc) = asNamespace("batchtools")
  on.exit(parent.env(jc) <- emptyenv())
  "!DEBUG [cfBrewTemplate]: Brewing template to file '`outfile`'"

  z = try(brew(text = text, output = outfile, envir = jc), silent = TRUE)
  if (is.error(z))
    stopf("Error brewing template: %s", as.character(z))
  waitForFile(outfile, reg$cluster.functions$fs.latency)
  Sys.chmod(outfile, "774") ### This is the added line.
  return(outfile)
}
mllg commented 2 years ago

This is probably nothing this package should be handling, this opens a can of worms. Please see man umask.