mlr-org / mlr3tuning

Hyperparameter optimization package of the mlr3 ecosystem
https://mlr3tuning.mlr-org.com/
GNU Lesser General Public License v3.0
54 stars 5 forks source link

Is custom resampling with AutoTuner supported? #207

Closed sguidi11 closed 4 years ago

sguidi11 commented 5 years ago

I am using a custom resampling within the AutoTuner function. I am getting back an error which I can't unfortunately understand. Any idea?

task = tsk("pima")
set.seed(123)

resampling = rsmp("custom")
train_sets = list(1:300 , 332:632, 633:738)
test_sets = list(301:331, 633:663, 739:768)
resampling$instantiate(task, train_sets, test_sets) 

learner = lrn("classif.rpart")
#resampling = rsmp("holdout")
measures = msr("classif.ce")
tune_ps = ParamSet$new(list(
  ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  ParamInt$new("minsplit", lower = 1, upper = 10)
))
terminator = term("evals", n_evals = 10)
tuner = tnr("random_search")

at = AutoTuner$new(
  learner = learner,
  resampling = resampling,
  measures = measures,
  tune_ps = tune_ps,
  terminator = terminator,
  tuner = tuner
)
at
at$train(task)
**Error in self$resampling$instantiate(self$task) : 
  Assertion on 'train_sets' failed: Must be of type 'list', not 'NULL'.**
berndbischl commented 5 years ago

thx.

1) first of all, the error seems to appear already for simple tuning (without AutoTuner).

task = tsk("pima")
set.seed(123)

resampling = rsmp("custom")
train_sets = list(1:300 , 332:632, 633:738)
test_sets = list(301:331, 633:663, 739:768)
resampling$instantiate(task, train_sets, test_sets)

learner = lrn("classif.rpart")
#resampling = rsmp("holdout")
measures = msr("classif.ce")
tune_ps = ParamSet$new(list(
ParamDbl$new("cp", lower = 0.001, upper = 0.1),
ParamInt$new("minsplit", lower = 1, upper = 10)
))
terminator = term("evals", n_evals = 10)
tuner = tnr("random_search")

inst = TuningInstance$new(task, learner, resampling, measures,
  tune_ps, terminator)
tuner$tune(inst)

2) whether we can support fixed splits in the AT is a different matter. we have dicussed this multiple times. currently the answer was that we should not. but nevertheless 1) should work in 2) would need a better error message

berndbischl commented 5 years ago

@sguidi11 also you posted the issue in the wrong tracker, you should post this here

https://github.com/mlr-org/mlr3tuning

i will now move the issue

berndbischl commented 5 years ago

and this is nearly the same issue https://github.com/mlr-org/mlr3tuning/issues/197

sguidi11 commented 5 years ago

Thank you for the answer. Now is clear. Sorry for posting in the wrong place

berndbischl commented 5 years ago

why are are closing? you certainly reported a valid bug? at least for the call to Tuner$tune?

sguidi11 commented 5 years ago

Ups, Pressed the wrong button. I am seriously trying to understand better the code.

berndbischl commented 5 years ago

n. I am seriously trying to understand better the code.

sure, just ask if you have questions

cjm715 commented 4 years ago

thx.

  1. first of all, the error seems to appear already for simple tuning (without AutoTuner).
task = tsk("pima")
set.seed(123)

resampling = rsmp("custom")
train_sets = list(1:300 , 332:632, 633:738)
test_sets = list(301:331, 633:663, 739:768)
resampling$instantiate(task, train_sets, test_sets)

learner = lrn("classif.rpart")
#resampling = rsmp("holdout")
measures = msr("classif.ce")
tune_ps = ParamSet$new(list(
ParamDbl$new("cp", lower = 0.001, upper = 0.1),
ParamInt$new("minsplit", lower = 1, upper = 10)
))
terminator = term("evals", n_evals = 10)
tuner = tnr("random_search")

inst = TuningInstance$new(task, learner, resampling, measures,
  tune_ps, terminator)
tuner$tune(inst)
  1. whether we can support fixed splits in the AT is a different matter. we have dicussed this multiple times. currently the answer was that we should not. but nevertheless 1) should work in 2) would need a better error message

A new error appears in the example script:

Error in assert_list(train_sets, types = "atomicvector", any.missing = FALSE): argument "train_sets" is missing, with no default

I suspect that it is due to the recent changes in commit within the repo mlr3: https://github.com/mlr-org/mlr3/commit/762e08e5383f3cdb42a899aa1bd94ec27a3d8654

Looks like the default of NULL for the train_sets and train_sets were removed.

mllg commented 4 years ago

This should be fixed now. AFAICT custom resamplings also did not work before the change in mlr3, only the error message would have been less informative.

Thanks for reporting.

berndbischl commented 4 years ago

@mllg : thx for your fix and also adding a testcase for the 2nd code-chunk above: the tuning case, with custom resampling. all good.

but: i also checked his first example. with the AT. and that uses fixed custom resampling on the inside. and that runs now too..... which it maybe should NOT? because we said here we dont support it? i am not even sure what happens exactly in that case?

we should either properly assert it or test and document what happens

mllg commented 4 years ago

What I did now:

  1. Allow providing an instantiated resampling (custom or whatever) for TuningInstance.
  2. Disallow providing an instantiated resampling (custom or whatever) for AutoTuner. If you pass an uninstantiated custom resampling, you will get an informative error message.

Did I miss something?

be-marc commented 4 years ago

We decided to not allow custom resampling in AutoTuner to prevent unavailable splits in the inner loop of nested resampling. However, custom resampling is allowed in the TuningInstance and therefore in the outer resampling loop (https://github.com/mlr-org/mlr3tuning/commit/1f531853e3dba58417bf2b7e4ea21d694bbf2c6a). We could discuss to implement a cv seed in the future #197.