Predict phase parameter optimization

mb706 commented 4 years ago

It should be possible to perform efficient optimization of predict phase parameters, maybe even simultaneously with (ordinary) train time parameters so that predict phase parameters are optimized in an inner loop apart from train time parameters.

This was our resolution for #50 (see https://github.com/mlr-org/mlr3tuning/issues/50#issuecomment-510668473) but apparently we don't have an issue mentioning this specifically?

berndbischl commented 4 years ago

can we add a simple example / task here so that we can work against that pls?

mb706 commented 4 years ago

We want to tune

library("mlr3learners")
ll = lrn("classif.glmnet")

on one of the paramsets

ps1 = ParamSet$new(list(
  ParamFct$new("s", levels = c("lambda.1se", "lambda.min"))
))

ps2 = ParamSet$new(list(
  ParamFct$new("s", levels = c("lambda.1se", "lambda.min")),
  ParamDbl$new("alpha", lower = 0, upper = 1)
))

Currently when we tune ps1 we perform both training and prediction. This may be desirable when the Learner or the resampling is stochastic in some way.
The tuning machinery knows that parameter "s" is a tags = "predict" parameter, so repeated model fits should not be necessary when tuning over ps1. There should be a way to prevent repeated train() calls and to just use the same model for different values of "s".
We may or may not want to support tuning ps2 with all the predict-time parameters tuned over separately (in an inner loop) from the train-time parameters. We could also forbid doing this and insist that predict-time tuning can only happen if the whole search space happens at predict-time. Then the user would have to set up an AutoTuner for the "s" parameter, and tune that one with another tuner that tunes over "alpha". This would probably be the simplest to implement, but in that case it would be nice to have some convenience functions.

mb706 commented 4 years ago

Note to myself: nesting autotuners behaves differently from doing two different opt methods (for train and predict) simultaneously, because the one does nested resampling, the other only has one resampling level.

mlr-org / mlr3tuning

Predict phase parameter optimization #212