mlr-org / mlr3tuning

Hyperparameter optimization package of the mlr3 ecosystem
https://mlr3tuning.mlr-org.com/
GNU Lesser General Public License v3.0
54 stars 5 forks source link

mlr3measures::mse() unexpected result #328

Closed MarcelMiche closed 2 years ago

MarcelMiche commented 2 years ago

mse() returns unexpected result when applied to a certain part of inner resampling (extended_archive$...prediction("test")), whereas returns expected result when applied to more detailed part of same inner resampling (...predictions("test")[[1]])

Date: 2022-02-09. R Version: 4.0.3 (2020-10-10). Platform: x86_64-apple-darwin17.0 (64-bit)

Setup: Using nested resampling (inner: rsmp("cv", 4), outer: rsmp("rep_cv", 2, 3)) and regr.glmnet as only learner, auto-tuning s (random search, terminate n_evals = 7), predict_sets = c("train", "test"), performance measure regr.mse. Execute: rrMap <- mlr3misc::map(as.data.table(rr)$learner, "model")

Replicable example with dummy data.

set.seed(123)
for(i in 1:9) {
    assign(paste0("x", i), rnorm(n=100, mean = sample(50:100,1), sd = sample(5:60,1)))
} # Make predictors
err <- rnorm(n=100, 1000, 100) # error term
y <- 42 - (2*x1) + 3*x2 - .66*x3 + .03*x4 + 1.7*x5 + .085*x6 + .1*x7 - .008*x8 + x9 + err
dat <- data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9)

library(mlr3verse)
tskReg <- TaskRegr$new(id="Check", backend=dat, target="y")
LASSO <- lrn("regr.glmnet", alpha = 1, predict_sets=c("train", "test"))
inner_rsmp = rsmp("cv", folds = 4)
measure = msr("regr.mse")
search_space = ps(s = p_dbl(lower = 0.001, upper = 0.8))
terminator = trm("evals", n_evals = 7)
tuner = tnr("random_search")
at = AutoTuner$new(LASSO, inner_rsmp, measure, terminator, tuner, search_space, store_models = TRUE) # at = auto tuning
outer_rsmp <- rsmp("repeated_cv", repeats = 2, folds = 3)
rr = resample(tskReg, at, outer_rsmp, store_models = TRUE)

rrMap <- mlr3misc::map(as.data.table(rr)$learner, "model")

rrMap[[1]]$tuning_instance$archive$extended_archive # Overview
rrMap[[1]]$tuning_instance$archive$best() # Best prediction

rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$prediction("test") # Best regr.mse = 13083.91 -> resample_result[[3]] -> prediction result

truthVals <- rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$prediction("test")$truth
respVals <- rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$prediction("test")$response # Extract truth and response values

### Unexpected result - why?
mlr3measures::mse(truth = truthVals, response = respVals) # Expected result = 13083.91, actual result = 13009.43

rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$score()[1,] # The same does not happen when using a single inner fold, instead of all four folds, per tuning parameter s.

truthVals <- rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$predictions("test")[[1]]$truth
respVals <- rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]$predictions("test")[[1]]$response

mlr3measures::mse(truth = truthVals, response = respVals) # Expected and actual result of 15541.3 agree.

Thank you very much in advance, not just for answering, but also for your efforts put into mlr3.

be-marc commented 2 years ago
rr = rrMap[[1]]$tuning_instance$archive$extended_archive$resample_result[[3]]

This is a ResampleResult object with 4 iterations. It has therefore 4 Prediction objects. You can individually score them.

rr$score()

#>             task task_id                 learner  learner_id         resampling resampling_id iteration           prediction  regr.mse
#>1: <TaskRegr[46]>   Check <LearnerRegrGlmnet[36]> regr.glmnet <ResamplingCV[19]>            cv         1 <PredictionRegr[19]> 15541.300
#>2: <TaskRegr[46]>   Check <LearnerRegrGlmnet[36]> regr.glmnet <ResamplingCV[19]>            cv         2 <PredictionRegr[19]>  5710.667
#>3: <TaskRegr[46]>   Check <LearnerRegrGlmnet[36]> regr.glmnet <ResamplingCV[19]>            cv         3 <PredictionRegr[19]> 15986.089
#>4: <TaskRegr[46]>   Check <LearnerRegrGlmnet[36]> regr.glmnet <ResamplingCV[19]>            cv         4 <PredictionRegr[19]> 15097.602

Internally, we call rr$aggregate() which calls mlr3measures::mse() on each Prediction object and then calculates the mean of the four regr.mse scores. The result 13083.91 is logged to the archive.

You used the rr$prediction("test") object which is the combined Prediction object of the four resampling iterations and then you called mlr3measures::mse() on the combined prediction result.

mllg commented 2 years ago

https://github.com/mlr-org/mlr3/commit/5cfdd9f333ddba32c918540e5d9795d9fc287be7 tries to clarify the difference.