mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

SGE submission fails due to invalid parallel environment in default template #288

Closed nickholway closed 2 years ago

nickholway commented 2 years ago

I'm trying to submit a trivial job on an Altair Grid Engine (formerly UGE & SGE) cluster. It fails because it's requesting a parallel environment incorrectly:

library(clustermq)
fx <- function(x) x * 2
Q(fx, x=1:6, n_jobs = 3)
Submitting 3 worker jobs (ID: cmq9197) ...
Unable to read script file because of error: ERROR! -pe option must have range as 2nd argument
[1] "#$ -N cmq9197\n#$ -j y\n#$ -o /dev/null\n#$ -cwd\n#$ -V\n#$ -t 1-3\n#$ -pe 1\n\nulimit -v $(( 1024 * 4096 ))\nCMQ_AUTH=dauia R --no-save --no-restore -e 'clustermq:::worker(\"tcp://REDACTED:9197\")'\n"
Error in (function (n_jobs, ..., log_worker = FALSE, verbose = TRUE)  : 
  Job submission failed with error code FALSE
In addition: Warning message:
In system2("qsub", input = filled, stdout = TRUE) :
  running command ''qsub' < '/tmp/Rtmp5qbIBg/file467e46f4be53'' had status 2

{U,S}GE parallel environments (PE) are requested with the option -pe <parallel environment name> <number of cores>, so it would be good to support a named PE in Q.

For single core jobs you don't need to specify a parallel environment, so I wonder if it'd be better to not have one specified by default in the template.

Session info:

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/prog/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] clustermq_0.8.95.3

loaded via a namespace (and not attached):
 [1] compiler_4.2.0   fastmap_1.1.0    R6_2.5.1         cli_3.3.0        htmltools_0.5.2  tools_4.2.0     
 [7] yaml_2.3.5       Rcpp_1.0.8.3     rmarkdown_2.14   codetools_0.2-18 knitr_1.39       xfun_0.31       
[13] digest_0.6.29    rlang_1.0.2      evaluate_0.15   
mschubert commented 2 years ago

You can already specify your own keys in the template, analogous to environments.

I will likely provide a default key in a future release.

wlandau commented 2 years ago

I'm running into this too, and I think we just need "smp" after "-pe" like in this line.

wlandau commented 2 years ago

Opened a quick PR: https://github.com/mschubert/clustermq/pull/289. Works on my company's SGE cluster.

mschubert commented 2 years ago

Is smp the default name of the parallel environment, or is this how it was called in your company?

I'm happy to merge if this is a general solution. For instance, does it solve the issue for you, @nickholway?

wlandau commented 2 years ago

That's a great point, I haven't used SGE anywhere else, so I don't know.

nickholway commented 2 years ago

It's smp at my employer too, so this should work for us.

I cannot for the life of me remember if "smp" is the default parallel environment for a single node in {S,U,A}GE though.

mschubert commented 2 years ago

I'll merge, please reopen if this causes any issues down the line.