mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
172 stars 51 forks source link

conceptual problems: designs where algo configs change over problems #23

Closed berndbischl closed 7 years ago

berndbischl commented 8 years ago

hi,

this is a problem that came up recently in a project and i couldnt find a way to write this down properly with bt.

i have a couple of instances, and some algos. to simplify this, imagine that the problems dont have any params. so i just have p_1, ... p_k. for the algos i would precreate, as a data.frame, the different config settings i want to compute and study. but: the algo configs should not be the same for every p_i.

reason: instead of "variance reduction" (= try out the same setting for each p_i), i want more "exploration" to learn possibly better how the params affect the algo performance.

problem: batchtools does not allow this. as i have to specify "algo.design" which is then used for every p_i. this is conceptually problematic as what i just outlined is something which is extremely common as an at least potential approach in experimental designs.

solution (?): instead of always forcing the user to enter prob.design and algo design, then internally compute the crossproduct, let the user already pass the combined design as a single df / dt. then he has complete control.

so in my case i would pass something like

prob.id, algo.id, algo.par.1, algo.par.2, ..., algo.par.m

berndbischl commented 8 years ago

prob.id, algo.id, algo.par.1, algo.par.2, ..., algo.par.m

(and of course you could also imagine prob.pars to be present in that structure as well)

mllg commented 8 years ago

You can just repeatedly call addExperiments with designs specifying single experiments. d85c267d6f30b3b0f759bc6e9e799c9e5c5f461d should simplify it by allowing some vectorization.

berndbischl commented 8 years ago

thx.

reopening this to clear up some things.

1) for reference, this is a usecase i was talking about

reg = makeExperimentRegistry(file.dir = NA, make.default = FALSE)
prob = addProblem(reg = reg, "p1", data = 1)
prob = addProblem(reg = reg, "p2", data = 2)
algo = addAlgorithm(reg = reg, "a", fun = function(...) list(...))

repls = 1
combine = "crossprod"
algo.designs = list(a = data.table(par1 = 1, par2 = 2))
addExperiments(reg = reg, prob.designs = list(p1 = data.table()), algo.designs = algo.designs, combine = combine)
algo.designs = list(a = data.table(par1 = 2, par2 = 1))
addExperiments(reg = reg, prob.designs = list(p2 = data.table()), algo.designs = algo.designs, combine = combine)
pars = getJobPars(reg = reg)
print(pars)

so, multiple problems, one algorithm, different algo configs for different problems.

2) apparently running this before your new commit was possible already? and also the new commit does not simplify this? (this is me just asking to clear this up, not saying that i dont see the usefulness of the new option)

3) the docs are now incorrect.

' If multiple problem designs or algorithm designs are provided, they are combined via the Cartesian product.

' Each row of a single problem design is combined with each row of a single algorithm design in either a

' \code{\link[base]{expand.grid}} or \code{\link[base]{cbind}} fashion (depending on parameter \code{combine}).

well, the 2nd sentence kinda contradicts the first one. obviously it not always a cartesian product.

4) your unit test does not really test my usecase. (does not even have multiple problems) (which i should have directly posted in code in the OP) i would suggest to add this and test (also) directly against what was discussed in the thread.

5) i find this complicated enough for users to understand (and the flexibility very useful!) so that we should add an example in addExperiments

6) in test_addExperiments there seems to be timing/tryout-example-code in an "if (FALSE)". i guess this should be cleaned up and removed.

mllg commented 8 years ago
  1. Yes, it was possible. No, not this case.
  2. Docs were not incorrect, but easy to misread. I've clarified it a bit.
  3. PR welcome.
  4. PR welcome.
  5. Don't know where I should put my benchmarks. This is not the worst place.
mllg commented 7 years ago

I've added unit tests and the docs are quite clear now.