mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
927 stars 86 forks source link

expect_resample_result() too strict? #893

Open be-marc opened 1 year ago

be-marc commented 1 year ago

Description

expect_resample_result() returns an error when a feature subset is selected with task$select() after instantiating the resampling because the task hashes do not match anymore. Maybe this shouldn't be an error because the number of features has no influence on the resampling splits.

Reproducible example

library(mlr3)
lapply(list.files(system.file("testthat", package = "mlr3"), pattern = "^helper.*\\.[rR]", full.names = TRUE), source)
library(testthat)

learner <- lrn("classif.rpart")
task <- tsk("pima")
resampling <- rsmp("holdout")

resampling$instantiate(task)
task$select(c("pregnant", "glucose"))
rr <- resample(task, learner, resampling)

expect_resample_result(rr)
#> Error: task$hash not equal to r$task_hash.
#> 1/1 mismatches
#> x[1]: "cdf5cca219fad06b"
#> y[1]: "c49b01e5056b388c"

Created on 2023-01-26 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.2 Patched (2022-11-10 r83330) #> os Ubuntu 22.04.1 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Berlin #> date 2023-01-26 #> pandoc 2.9.2.1 @ /usr/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.2) #> brio 1.1.3 2021-11-30 [1] CRAN (R 4.2.2) #> checkmate 2.1.0 2022-04-21 [1] CRAN (R 4.2.2) #> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.2) #> codetools 0.2-18 2020-11-04 [4] CRAN (R 4.2.0) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.2) #> data.table 1.14.6 2022-11-16 [1] CRAN (R 4.2.2) #> desc 1.4.2 2022-09-08 [1] CRAN (R 4.2.2) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2) #> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.2) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.2) #> future 1.30.0 2022-12-16 [1] CRAN (R 4.2.2) #> future.apply 1.10.0 2022-11-05 [1] CRAN (R 4.2.2) #> globals 0.16.2 2022-11-21 [1] CRAN (R 4.2.2) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.2) #> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.2) #> knitr 1.41 2022-11-18 [1] CRAN (R 4.2.2) #> lgr 0.4.4 2022-09-05 [1] CRAN (R 4.2.2) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.2) #> listenv 0.9.0 2022-12-16 [1] CRAN (R 4.2.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.2) #> mlr3 * 0.14.1 2023-01-26 [1] Github (mlr-org/mlr3@849dd87) #> mlr3misc 0.11.0 2022-09-22 [1] CRAN (R 4.2.2) #> palmerpenguins 0.1.1 2022-08-15 [1] CRAN (R 4.2.2) #> paradox 0.11.0 2022-11-21 [1] CRAN (R 4.2.2) #> parallelly 1.34.0 2023-01-13 [1] CRAN (R 4.2.2) #> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.2) #> pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.2.2) #> purrr 0.3.5 2022-10-06 [1] CRAN (R 4.2.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.2) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.2) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.2) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.2) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.2) #> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.2) #> rmarkdown 2.18 2022-11-09 [1] CRAN (R 4.2.2) #> rpart 4.1.19 2022-10-21 [4] CRAN (R 4.2.1) #> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.2) #> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.2) #> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.2) #> styler 1.8.1 2022-11-07 [1] CRAN (R 4.2.2) #> testthat * 3.1.5 2022-10-08 [1] CRAN (R 4.2.2) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.2) #> uuid 1.1-0 2022-04-19 [1] CRAN (R 4.2.2) #> vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.2.2) #> waldo 0.4.0 2022-03-16 [1] CRAN (R 4.2.2) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.2) #> xfun 0.35 2022-11-16 [1] CRAN (R 4.2.2) #> yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.2) #> #> [1] /home/marc/R/x86_64-pc-linux-gnu-library/4.2 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
berndbischl commented 1 week ago

this check seems way too strict. if the "contract" is that the hash of the task is stored in the resanmpling (instance), and then the task cannot be modified anymore, this check has way more downsides, than benefits.

remove the hash, if its only use is consistency checking