mlr-org / mlrMBO

Toolbox for Bayesian Optimization and Model-Based Optimization in R
187 stars 47 forks source link

Exception handling in mlrMBO #391

Open smilesun opened 7 years ago

smilesun commented 7 years ago

If one runs hyper-parameter optimization with mlrMBO on a cluster and due to the resources limit(memory limit or time limit for example) the scheduling system has to kill the process. In this case, is there a way to write an easy Exception handling code snip to still get the current best result?

jakob-r commented 7 years ago

An easy solution would be to save intermediate results following this example:

save.file = "~/mboState_run01.RData" # a file that can be accessed from all nodes on the cluster
ctrl = makeMBOControl( = 0L:50L, save.file.path = save.file)
ctrl = setMBOControlTermination(ctrl, iters = 50L)
or = mbo(f, control = ctrl)
# after this timed out
or = mboContinue(save.file)
# or if you don't want to further continue the optimization with the left budget just call
or = mboFinalize(save.file)

Side question: Do you use batchtools or BatchJobs?

smilesun commented 7 years ago

Thanks, I do not know one could provide a file in "makeMBOControl" before, yes, I am using batchtools for tuning ML hyperparameters. Now I need to find a way to organize maybe 100 files and continue them afterwards in another call.

danielhorn commented 7 years ago

You should use the imputation mechanism of mbo. I can senden you some example code Tomorrowland.

berndbischl commented 7 years ago

the problem is deeper.

a) the solution from @jakob-r doesnt work that well, i want to run my point eval in a separate process. the continuation procedure is more a last measure, i want something robust.

b) there is the runexec tool that we really should support very soon. this solves our problem on all systems once and for all.

c) @smilesun can you use batchtools to generate your points? this would on a cluster ensure that you run in a separate process. please post a MINIMAL example so we can look at that.

smilesun commented 7 years ago

Notes for discussion Solution 1: set the time limit directory
Solution 2 : parallel xgboost (allocate multicore for one job) Solution 3: multiple point ( allocate multicore for one job)