mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
172 stars 51 forks source link

Creating simple experiment from function without "data" argument #132

Open klican opened 7 years ago

klican commented 7 years ago

I am struggling to design an experiment using my own function, which I will try to illustrate here using a function piApprox() from Example 1 from "Get Started":

piApprox = function(n) {
  nums = matrix(runif(2 * n), ncol = 2)
  d = sqrt(nums[, 1]^2 + nums[, 2]^2)
  4 * mean(d <= 1)
}

I now want to create and run 50 jobs using piApprox() in a way they will satisfy 3 conditions:

  1. I want to run piApprox() with 5 different values of parameter n: n1=1000, n2=2000, n3=3000, n4=4000, n5=5000, with 10 jobs created for each value of n

  2. Each of these 5 parametrizations to start from seed=1 so these 5 sets of results stay stochastically comparable

  3. For convenience, I want to keep all the code for definition/configuration of this example as single R script, ideally creating only a single registry

My first solution is to run all 5*10 jobs together like this:

reg = makeRegistry(file.dir = NA, seed = 1)
batchMap(fun = piApprox, n = rep(1:5, each=10)*1000)

Upper solution would satisfy conditions 1 and 3, but not the important condition 2 about seeding

My second solution I can think of is to create five separate scripts/registers each with separate value of parameter n, satisfying conditions 1 and 2, but tedious to set up and collect results

Now I believe this is possible to define this assignment using addProblem(), addAlgorithm(), addExperiment() functions in a way the solution would satisfy my 3 aforementioned conditions. But as a beginner, I am struggling to come up with such solution (e.g. what can I pass as the data parameter to addProblem(), when piApprox() is generating its own data?).

I think that an exercise similar to what I am describing could serve as helpful bridging example between simplest Example1 and more advanced Example2.

mllg commented 7 years ago

This should do:

piApprox = function(n) {
  nums = matrix(runif(2 * n), ncol = 2)
  d = sqrt(nums[, 1]^2 + nums[, 2]^2)
  4 * mean(d <= 1)
}

wapper = function(n, i) {
  set.seed(i)
  piApprox(n)
}

makeRegistry(NA)
batchMap(fun = piApprox, CJ(n = 1:5*1000, i = 1:10))

And you're right, you could do this with an ExperimentRegistry. You do not have to specify data, and the problem seed increments with each replication. The default algorithm function (for algorithm "dummy") just returns the problem instance.

piApprox = function(n, ...) { # <- "..." required
  nums = matrix(runif(2 * n), ncol = 2)
  d = sqrt(nums[, 1]^2 + nums[, 2]^2)
  4 * mean(d <= 1)
}

makeExperimentRegistry(NA)
addProblem(name = "piApprox", fun = piApprox, seed = 1L)
addAlgorithm(name = "dummy")
addExperiments(prob.designs = list(piApprox = data.frame(n = 1:5*1000)), repls = 10)

I'll try to ease the transition between the examples as soon as I find some spare time.