mlr-org / mlr3tuning

Hyperparameter optimization package of the mlr3 ecosystem
https://mlr3tuning.mlr-org.com/
GNU Lesser General Public License v3.0
54 stars 5 forks source link

Avoid duplicated results in tuning instance? #203

Closed giuseppec closed 2 months ago

giuseppec commented 5 years ago

Not sure if the code below should produce at least a warning that the learner was already evaluated with the same parameters:

library(mlr3)
library(mlr3learners)
library(mlr3tuning)
task = tsk("sonar")
learner = lrn("classif.kknn", predict_type = "prob")
learner$param_set
tune_ps = ParamSet$new(list(
  ParamInt$new("k", lower = 1, upper = 2)
))

instance = TuningInstance$new(
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measures = msr("classif.auc"),
  param_set = tune_ps,
  terminator = term("none")
)

set.seed(1)
tuner_grid = tnr("grid_search", resolution = 2)
tuner_grid$tune(instance)
tuner_grid$tune(instance) # causes duplicated results if the user runs this line multiple times "accidentally"

perfdata = instance$archive("params")
perfdata[, c("nr", "k", "classif.auc")]
   nr k classif.auc
1:  1 1   0.7978992
2:  2 2   0.8735294
3:  3 1   0.7978992
4:  4 2   0.8735294

If the learner is stochastic such as ranger, something like this could happen:

   nr mtry classif.ce
1:  1    1  0.1884058
2:  2    2  0.1594203
3:  3    1  0.1739130
4:  4    2  0.1884058

Storing results of hyperparameter combinations that were already evaluated should maybe avoided?

berndbischl commented 5 years ago

thats nearly the same issue as #127

berndbischl commented 5 years ago

currently we see it this way: if that happens thats an aspect of your tuner / the size of your search space. i dont think its very easy to transparently handle this. what should we do?

i can warn about this. but i dislike warnings in general somewhat. here, it might fine.

@mllg ?

berndbischl commented 5 years ago

@giuseppec also your use-case seems very weird? you run twice on the same instance?

giuseppec commented 5 years ago

Yeah, it's not a real "use case". I just stumbled over this "issue" as I accidentally run the line tuner_grid$tune(instance) twice, one after another, and then I wondered about the duplicated results. Maybe this is just "own stupidity" when it happens, but I wanted to mention it here, because I wasn't sure if it has any other implications.

berndbischl commented 5 years ago

Yeah, it's not a real "use case". I just stumbled over this "issue" as I accidentally run the line tuner_grid$tune(instance) twice, one after another, and then I wondered about the duplicated results.

thats abolutely fine. and good to report such stuff then. i also opened this #204

jakob-r commented 5 years ago

I think it is perfectly fine that the code runs like that. A tuner ist told to run on an instance and we specifically allow it(?) to not be empty. If the tuner is not incorporating the archive it is fine (e.g. random search). If the tuner is stupid and does always the same (e.g. grid search) it's the users problem. Also the user could change e.g. inst$learner$param_set$values$distance (in your example) and run the tuner on again. This would also totally be valid.

Consequently a warning should only be issued if it is 100% necessary.

berndbischl commented 5 years ago

agreeing with @jakob-r. although i guess i COULD see the case for a warning. OTOH they tend to get annoying. can we get a quick vote if you want to see a warning in the general case that a configuration is evaluated multiple times?

@jakob-r @mllg @mb706 @giuseppec @larskotthoff @pfistfl

(Edit by @jakob-r: Voting please with :+1: and :-1: )

mb706 commented 4 years ago

Do tuners aggregate the configurations that were evaluated multiple times, or do they just report the best one? Both behaviours would have their problems. The eval_batch() call could also refuse to evaluate the configuration a second time and just return the previous result, so that "dumb" algos like grid search and random search (with discrete search space) don't trip over this. Explicitly multi-fidelity algos like hyperband and MBO would then have to set an extra flag in the eval_batch() call to get around this.

berndbischl commented 4 years ago

currently mlr3tuning does not talk / worry about "repeated" evals. every evlaution is treated as "different", although it might not be.

if we want to handle this, a proposal must be written down carefully first

mb706 commented 4 years ago

Proposal:

TuningInstance$eval_batch() gets argument reevaluate default FALSE. If reevaluate is FALSE, then the configurations that are already in self$bmr are not evaluated again; instead their performance from previous runs is used and returned as perf. If reevaluate is TRUE then behaviour is as it is right now (and without warning message). Random search and grid search use reevaluate = FALSE, some other algorithms (e.g. irace) may have reevaluate = TRUE. Algorithms that have reevalaute = TRUE need to take special care what performance they report in assign_result().

jakob-r commented 4 years ago

Would it be better if reevaluate was a property of the TuningInstance instead of an argument that has to be passed all the time? It could be an active binding that is set to TRUE if the learner is deterministic and to FALSE if the learner is stochastic. Probably the user should be able to override this behavior. Then the tuner can adapt its behavior accordingly.

mb706 commented 4 years ago

This is something that the tuning algorithm should decide, not the user (Although possibly the user may set an argument in the tuning algorithm that changes this). Having information about whether the learning alg is stochastic would still be a good idea.

MLopez-Ibanez commented 2 years ago

This is something that the tuning algorithm should decide, not the user (Although possibly the user may set an argument in the tuning algorithm that changes this). Having information about whether the learning alg is stochastic would still be a good idea.

irace has an option deterministic to specify if running the same configuration on the same instance will produce the same value. If deterministic=true then irace never attempts to do that. If deterministic=false, then irace makes sure to vary the seed passed when re-evaluating so that the same configuration is never evaluated on the same instance-seed pair.

I would consider deterministic a setting of the scenario not of the tuner (the tuner may use it or ignore it).

be-marc commented 2 months ago

We decided that the tuner should handle this.