Open mb706 opened 8 years ago
There is a lot about the optPath which can be improved. And many things are high on my whish list but because of a lot of obstacles low on my want-to-do list. This is definitly an aspect which has to be kept in mind. What we did in quick and dirty fashion for another project is to save the init design and the resulting opt.path in file. Then you change addOptPathEl so that it justs stores what you want to add in a separate file. When you want to read the whole optPath you just iterate over all addOptPath-Files and add them to the initial optPath (which can also be empty if you don't have an initial design).
mlr tuneIrace fails when given the 'parallel' option, apparently
Has that already been reported? I can see this here, https://github.com/mlr-org/mlr/issues/472 but that seems to indicate that parallel tuning is in GENERAL not supported for irace at the moment? But I could easily fix that as I have code for that already.
Some more general comments here:
I really depends on what you want to do in what algorithms. In irace we do not control the tuner itself. So basically the only thing we can do is let irace create its candidates and have them evaluated in paralllel. Aftert that "batch eval" we write to the optpath. This is already possible in mlr, we already do this sometimes for other methods.
And for mbo we already work on something else for asynchronous parallel execution.
So I am not sure how useful this DB backend is (now), as this seems also like a lot of work?
But we can also talk in person about this
Re parallelization in irace:
The irace package supports parallelization, effective when a parallel
option is given to the irace
call. This can be used by calling makeTuneControlIrace
with this option, but the run fails exactly because the modification of the OptPath object happens in a different thread from the main thread.
To reproduce (runs for about two minutes)
> ctrl = makeTuneControlIrace(maxExperiments = 200L, show.irace.output=TRUE, parallel=2)
> lrn = makeLearner("classif.IBk")
> ps = makeParamSet(makeLogicalParam("I"), makeIntegerParam("K", lower=1, upper=10))
> rdesc = makeResampleDesc("Holdout")
> ctrl = makeTuneControlIrace(maxExperiments = 200L, parallel=2) # !! Parallelize w/ 2 threads
> options(error=dump.frames)
> tuneParams(lrn, pid.task, rdesc, par.set=ps, control=ctrl, show.info=FALSE)
Error in as.data.frame.OptPathDF(opt.path) :
No elements where selected (via 'dob' and 'eol')!
> debugger()
Message: Error in as.data.frame.OptPathDF(opt.path) :
No elements where selected (via 'dob' and 'eol')!
Available environments had calls:
1: tuneParams(lrn, pid.task, rdesc, par.set = ps, control = ctrl, show.info =
2: sel.func(learner, task, resampling, measures, par.set, control, opt.path, s
3: convertDfCols(as.data.frame(opt.path), logicals.as.factor = TRUE)
4: assertDataFrame(df)
5: checkDataFrame(x, types, any.missing, all.missing, min.rows, min.cols, nrow
6: .Call(c_check_dataframe, x, any.missing, all.missing, min.rows, min.cols, n
7: as.data.frame(opt.path)
8: as.data.frame.OptPathDF(opt.path)
9: stopf("No elements where selected (via 'dob' and 'eol')!")
10: stop(obj)
Enter an environment number, or 0 to exit Selection: 2
Browsing in the environment with call:
sel.func(learner, task, resampling, measures, par.set, control, opt.path, s
Called from: debugger.look(ind)
Browse[1]> opt.path
Optimization path
Dimensions: x = 2/2, y = 1
Length: 0
Add x values transformed: FALSE
Error messages: TRUE. Errors: 0 / 0.
Exec times: TRUE. Range: 0 - 0. 0 NAs.
Browse[1]> or
.ID. I K .PARENT. .ALIVE. .RANK. .WEIGHT.
19 19 FALSE 5 13 TRUE 29 0.5000000
4 4 TRUE 8 NA TRUE 33 0.3333333
20 20 TRUE 8 4 TRUE 33 0.1666667
As can be seen, the result of the irace run (or
) does have candidates but the opt.path
is empty, which leads to the reported error. If OptPath were persistent throughout R threads, this would not be a problem -- though another option admittedly is passing up generating an OptPath if parallel
is given.
Most optimization processes that run in parallel need some form of synchronization; the OptPath would be a natural candidate for this. An OptPath that is synchronized among multiple R threads would for example be useful in the following: 1) mlr
tuneIrace
fails when given the 'parallel' option, apparently because the OptPath is modified inside threads that are not the main R process 2) The OptPath would be a good means for synchronization in mlrMBO -- one could then parallelize at the point of the repeat statement inmboTemplate.OptState
.For this one would need to add the option to create a database backed OptPath to
makeOptPathDF
and change thesetOptPathXX
andgetOptPathXX
methods to S3 methods. Additionally, some database operators and synchronization operators might be useful (commit, lock, unlock, unlock-and-wait-for-change).Since performance is not an issue (as OptPath is not written or read very often), one could use the RSQLite package and synchronize with the synchronicity or flock package.