michaelmayer2 commented 9 years ago

Hello,

we are heavily using BatchJobs for our day-to-day work and it works really nicely - great work.

Some of our users however see the lack of full Array Job support as a problem. Fully support would give them the ability to more easily kill jobs during the testing phase of code development.

Using the latest available BatchJobs from CRAN (1.5) and the following sge.tmpl and BatchJobs_global_config.R as specified below I can see the following behaviour: Regardless of how many tasks I specify in my submitJobs command, all tasks are independent jobs. Each independent job is an array taks with index running from 1 to 1.

qstat reports for example

... 1487367 0.50005 minimal-42 mayermid r 02/24/2015 17:22:14 all.q@node20 1 1 1487368 0.50005 minimal-43 mayermid r 02/24/2015 17:22:14 all.q@node18 1 1 1487369 0.50005 minimal-44 mayermid r 02/24/2015 17:22:14 all.q@node26 1 1 1487370 0.50005 minimal-45 mayermid r 02/24/2015 17:22:14 all.q@node30 1 1 1487371 0.50004 minimal-46 mayermid r 02/24/2015 17:22:14 all.q@node28 1 1 1487372 0.50004 minimal-47 mayermid r 02/24/2015 17:22:14 all.q@node24 1 1 1487373 0.50004 minimal-48 mayermid r 02/24/2015 17:22:14 all.q@node31 1 1 1487374 0.50004 minimal-49 mayermid r 02/24/2015 17:22:14 all.q@node25 1 1 1487375 0.50004 minimal-50 mayermid r 02/24/2015 17:22:14 all.q@node21 1 1

I have heard rumours that SGE testing is limited due to the lack of availability of a test machine. I would be happy to support testing if necessary to make this work.

My ultimate goal would be that only one array task is submitted that has one job ID and x number of tasks (e.g. 50 in this case).

Would that be feasible or am I doing something completely wrong here ?

Many thanks,

Michael.

PS: On a related note, #<%= resources$queue %> is not properly resolved either - I had to comment out an manually specify the queue name.

sge.tmpl

!/bin/bash

The name of the job, can be anything, simply used when displaying the list of running jobs

$ -N <%= job.name %>

Combining output/error messages into one file

$ -j y

Giving the name of the output log file

$ -o <%= log.file %>

One needs to tell the queue system to use the current directory as the working directory

Or else the script may fail as it will execute in your top level home directory /home/username

$ -cwd

use environment variables

$ -V

use correct queue

$ -q all.q #<%= resources$queue %>

use job arrays

$ -t 1-<%= arrayjobs %>

we merge R output with stdout from SGE, which gets then logged via -o option

R-3.0.2 CMD BATCH --no-save --no-restore "<%= rscript %>" /dev/stdout exit 0

BatchGlobalConfig.R

cluster.functions = makeClusterFunctionsSGE(template.file="/home/mayermid/R/x86_64-unknown-linux-gnu-library/3.0/BatchJo bs/etc/sge.tmpl") mail.start = "none" mail.done = "none" mail.error = "none" db.driver = "SQLite" db.options = list() debug = FALSE

minimal.R - a simple/stupid example - no need to discuss the cleverness of this code - just used it to get some compute load on the cluster.

library("BatchJobs") library("foreach") f <- function(data) { library(foreach)

for (i in 1:50 ) {

x <- seq(-20, 20, by=0.01)

v <- foreach(y=x) %do% { r <- sqrt(x^2 + y^2) + .Machine$double.eps sin(r) / r } } -data }

Create simple registry:

reg <- makeRegistry(id="minimal", file.dir="minimal") batchMap(reg, f, 1:50) submitJobs(reg)

michaelmayer2 commented 9 years ago

One minor update - if I replace the submitjobs(reg) to use chunks like

chunked = chunk(getJobIds(reg), n.chunks=10, shuffle=TRUE) submitJobs(reg,chunked,chunks.as.arrayjobs=TRUE)

then each task's chunks is run as array jobs.

Question now - would it be possible to enable array jobs for any submitJob command straight away like pointed out above, say with using tasks.as.arrayjobs=TRUE as an optional argument to submitJobs ?

mllg commented 9 years ago

then each task's chunks is run as array jobs.

Glad to hear. This one was a bit tricky to implement.

Question now - would it be possible to enable array jobs for any submitJob command straight away like pointed out above, say with using tasks.as.arrayjobs=TRUE as an optional argument to submitJobs ?

Just to be perfectly clear: You want an additional argument to replace

submitJobs(reg, list(ids), chunks.as.arrayjobs = TRUE)

with the shorter

submitJobs(reg, tasks.as.arrayjobs = TRUE)

?

michaelmayer2 commented 7 years ago

sorry, this took a while to respond ...

The idea is to not only use chunks.as.arrayjobs for running array jobs for chunks but also to have tasks.as.arrayjobs to run the jobs launched by submitJobs (even when there is no chunks) as arrays jobs.

Obviosuly chunks.as.arrayjobs and tasks.as.arrayjobs are then becoming mutually exclusive.

Technical reason is that apparently schedulers (e.g. SLURM - latest cluster I am working with uses SLURM) check all the jobs each scheduling cycle. If there is an array job in the queue that currently runs say 2000 tasks however, this array job is only checked once and not all the 2000 tasks. This increases the efficiency of the scheduler, especially if you have a number of users each hammering the cluster with their batchjobs type jobs.

tudo-r / BatchJobs

Full support for R Array Jobs in SGE ? #74

sge.tmpl

!/bin/bash

The name of the job, can be anything, simply used when displaying the list of running jobs

$ -N <%= job.name %>

Combining output/error messages into one file

$ -j y

Giving the name of the output log file

$ -o <%= log.file %>

One needs to tell the queue system to use the current directory as the working directory

Or else the script may fail as it will execute in your top level home directory /home/username

$ -cwd

use environment variables

$ -V

use correct queue

$ -q all.q #<%= resources$queue %>

use job arrays

$ -t 1-<%= arrayjobs %>

we merge R output with stdout from SGE, which gets then logged via -o option

BatchGlobalConfig.R

minimal.R - a simple/stupid example - no need to discuss the cleverness of this code - just used it to get some compute load on the cluster.

Create simple registry: