mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
171 stars 51 forks source link

getJobPars fails when prob.designs column contains objects of varying length #123

Closed bart1 closed 7 years ago

bart1 commented 7 years ago

Sometimes it is useful to pass more complex objects to a problem function. for example vectors of varying length, for example a selection of species or individuals. The package seems to deal well with this until the parameters are retrieved using getJobPars, see the example below.

require(batchtools)
tmp = makeExperimentRegistry(file.dir = NA, make.default = FALSE)

# add first problem
fun = function(job, data, n, mean, sd, ...) rnorm(sum(n), mean = mean, sd = sd)
addProblem("rnorm", fun = fun, reg = tmp)

fun = function(instance, ...) sd(instance)
addAlgorithm("deviation", fun = fun, reg = tmp)

# define problem and algorithm designs
prob.designs = algo.designs = list()
prob.designs$rnorm = data.table(expand.grid(n = list(100, 1:4), mean = -1:1, sd = 1:5))
algo.designs$deviation = data.table()
addExperiments(prob.designs, algo.designs, reg = tmp)
submitJobs(reg = tmp)

# collect results and join them # with problem and algorithm paramters
ijoin(
  getJobPars(reg = tmp),
  reduceResultsDataTable(reg = tmp, fun = function(x) list(res = x))
)

this fails with:

Error in rbindlist(.mapply(function(job.id, pars, ...) c(list(job.id = job.id),  : 
  Column 2 of item 2 is length 4, inconsistent with first column of that item which is length 1. 
rbind/rbindlist doesn't recycle as it already expects each item to be a uniform list, data.frame or data.table

Some more info:

> traceback()
5: rbindlist(.mapply(function(job.id, pars, ...) c(list(job.id = job.id), 
       unlist(pars, recursive = FALSE)), tab, list()), fill = TRUE)
4: getJobPars.ExperimentRegistry(reg = tmp)
3: getJobPars(reg = tmp)
2: as.data.table(x)
1: ijoin(getJobPars(reg = tmp), reduceResultsDataTable(reg = tmp, 
       fun = function(x) list(res = x)))
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.04

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] batchtools_0.9.3  data.table_1.10.4

loaded via a namespace (and not attached):
 [1] compiler_3.4.1    backports_1.1.0   magrittr_1.5      assertthat_0.2.0  R6_2.2.2         
 [6] base64url_1.2     prettyunits_1.0.2 tools_3.4.1       rappdirs_0.3.1    stringi_1.1.5    
[11] progress_1.1.2    checkmate_1.8.3   digest_0.6.12     brew_1.0-6       
mllg commented 7 years ago

The heuristic which detects if the parameters can be represented as data.table columns is broken.

  1. What output would you expect for your example?
  2. As a temporary workaround, you can pass flatten = FALSE to getJobPars().
bart1 commented 7 years ago

I would probably expect the similar output as input for prob.designs so the following:

cbind(job.id=1:30,problem=factor("rnorm"), algorithm=factor("deviation"),data.table(expand.grid(n = list(100, 1:4), mean = -1:1, sd = 1:5)))

Thanks for the work around!

mllg commented 7 years ago

Heuristic is fixed. I'm thinking about dropping the heuristic and wrapping everything with length > 1 in a list... Need to sleep on it first though.

mllg commented 7 years ago

In the future I will just return lists. Your example nicely demonstrates that the parameters can virtually be anything, and it is impossible to get this right programmatically. I will offer a unwrap()/unnest() (similar to tidy::unnest()) function to ease the transition and provide an easy way to get "flat" data tables for most use cases. I'm waiting for the next data.table release though, there is a bug in rbindlist which blocks this (see https://github.com/Rdatatable/data.table/pull/2077).

mllg commented 7 years ago

Fixed in 0.9.4.

bart1 commented 7 years ago

Thank you very much for the fix, looking forward to the new release