tudo-r / BatchJobs

BatchJobs: Batch computing with R
Other
85 stars 20 forks source link

batchMapQuick chunks not working (on LSF) #10

Closed mschubert closed 10 years ago

mschubert commented 10 years ago

Consider the function below. Each time I have got 4 jobs that should be put in one chunk, but LSF always submits 4 individual jobs disregarding the chunks I specify.

Note: this works fine when specifying the chunks manually (last example).

square = function(x) { x*x }
library(BatchJobs)

reg = batchMapQuick(square, c(1:4), chunk.size=10)
#Saving conf: /some/dir/bmq_341e69aa7610/conf.RData
#Submitting 4 chunks / 4 jobs.
#Cluster functions: LSF.

reg = batchMapQuick(square, c(1:4), n.chunks=1)
#Saving conf: /some/dir/bmq_3c8c7b37b8cb/conf.RData
#Submitting 4 chunks / 4 jobs.
#Cluster functions: LSF.

reg = makeRegistry(id="BatchJobsExample", file.dir=tempfile(), seed=123)
batchMap(reg, square, c(1:4))
chunked = chunk(getJobIds(reg), n.chunks=1, shuffle=TRUE)
submitJobs(reg, chunked)
#Saving conf: /tmp/RtmpsWXKsa/file6bec917d96c/conf.RData
#Submitting 1 chunks / 4 jobs.
#Cluster functions: LSF.

The problem seems to be lines 55/56 in R/batchMapQuick.R:

if (!missing(chunk.size) && !missing(n.chunks))
    ids = chunk(ids, chunk.size=chunk.size, n.chunks=n.chunks, shuffle=TRUE)

Here, the conditions are only true if both chunk.size and n.chunks are given. It should rather be something like:

if (!missing(chunk.size) && !missing(n.chunks))
    stop("Providing both chunk.size and n.chunks makes no sense")
if (!missing(chunk.size))
    ids = chunk(ids, chunk.size=chunk.size, shuffle=TRUE)
if (!missing(n.chunks))
    ids = chunk(ids, n.chunks=n.chunks, shuffle=TRUE)
berndbischl commented 10 years ago

Micheal,

thank you for the detailed report. Looking into it right now.

mschubert commented 10 years ago

I'm trying to make your life easier because I will submit at least one more ;-)

Looking at this again, it might be sufficient to change line 55 to:

if (!missing(chunk.size) || !missing(n.chunks))

The chunk error handling seems to already be provided by the chunk() function.

berndbischl commented 10 years ago

Yup, I already did that.

Trying to figure out how to add a unit test for that, but will push soon.

And keep the feedback coming, it's appreciated.

berndbischl commented 10 years ago

It is unit-tested and pushed. You can close if it works for you.

mschubert commented 10 years ago

Yep, that seems to work. Cool, thanks for fixing it so quickly! :-)