ResampleResult and BenchmarkResult's `$score()` behave surprisingly when passing a `predict_set`

Example for the Benchmark Result is given below. In both cases, the argument predict_set is just not taken into account when scoring the measures. The problem are these lines:

https://github.com/mlr-org/mlr3/blob/6abf4158d02811a3fc618e8ec4f3cdb35043887c/R/BenchmarkResult.R#L164
https://github.com/mlr-org/mlr3/blob/6abf4158d02811a3fc618e8ec4f3cdb35043887c/R/ResampleResult.R#L148

Also, the $aggregate() method of both classes is missing the predict_set argument.

library(mlr3)
learner = lrn("regr.debug")
learner$predict_sets = c("test", "holdout")
task = tsk("mtcars")
row = task$data(1)
row$..row_id = 1000
row$mpg = 10000000
task$rbind(row)
task$set_row_roles(1000, "holdout")
bmr = benchmark(benchmark_grid(task, learner, rsmp("holdout")))
#> INFO  [11:11:10.706] [mlr3] Running benchmark with 1 resampling iterations
#> INFO  [11:11:10.740] [mlr3] Applying learner 'regr.debug' on task 'mtcars' (iter 1/1)
#> INFO  [11:11:10.753] [mlr3] Finished benchmark

score = bmr$score(msr("regr.mse"), predict_sets = "holdout")
(score$prediction[[1]]$truth - score$prediction[[1]]$response)^2
#> [1] 9.999962e+13
score$regr.mse
#> [1] 53.05924

^{Created on 2024-02-16 with reprex v2.0.2}

mlr-org / mlr3

ResampleResult and BenchmarkResult's `$score()` behave surprisingly when passing a `predict_set` #1006