mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
945 stars 85 forks source link

PipeOp "classweights" should throw a warning if it is attached to a learner which does not have the "weights" property #931

Open FlorianPargent opened 1 year ago

FlorianPargent commented 1 year ago

I reran code from a now accepted paper of mine on cost-sensitive ML for psychologists with mlr3 in which we used classweights in combination with "classif.log_reg". Recently, mlr3learners removed the "weights" property from "classif.log_reg" (which I think was a very bad decision, see my related issue in the mlr3learners github repository: mlr-org/mlr3learners#265). This made me aware of the following issue in mlr3pipelines:

If I build a pipeline combining a "classweights" pipeop with "classif.log_reg", this will silently ignore my weights (because classif.log_reg does not have the weights property anymore). This behavior is extremely dangerous, and I only noticed it because in my paper I have computed both conditions (with and without weighting) and both gave exactly the same results!

Would it be possible to include a warning when using classweights with learners without this property?

library("mlr3verse")
task = tsk("german_credit")

costs = matrix(c(0, 1, 5, 0), nrow = 2)
dimnames(costs) = list(response = c("good", "bad"), truth = c("good", "bad"))
print(costs)

logreg = lrn("classif.log_reg", predict_type = "prob")
tw = po("classweights", minor_weight = 5)
tw_logreg = as_learner(tw %>>% logreg)

# we get exactly the same benchmark result for both conditions (normal logreg and logreg with weighting)
# the fact that classif.log_reg does not have a weights property (since mlr3learners version 0.5.4) is silently ignored!
set.seed(1)
bm = benchmark(benchmark_grid(list(task), list(logreg, tw_logreg),
  rsmp("cv")), store_models = FALSE)
bm$aggregate(msr("classif.costs", costs = costs))

#     nr      resample_result       task_id                   learner_id resampling_id iters classif.costs
# 1:  1 <ResampleResult[21]> german_credit              classif.log_reg            cv    10         0.851
# 2:  2 <ResampleResult[21]> german_credit classweights.classif.log_reg            cv    10         0.851
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] mlr3verse_0.2.7 mlr3_0.14.1    

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0       xfun_0.37              clusterCrit_1.2.8      listenv_0.9.0          mlr3cluster_0.1.6     
 [6] generics_0.1.3         colorspace_2.1-0       vctrs_0.5.2            htmltools_0.5.4        bbotk_0.7.2           
[11] yaml_2.3.7             paradox_0.11.0         utf8_1.2.3             rlang_1.0.6            pillar_1.8.1          
[16] glue_1.6.2             withr_2.5.0            palmerpenguins_0.1.1   uuid_1.1-0             mlr3fselect_0.9.1     
[21] lifecycle_1.0.3        mlr3learners_0.5.6     munsell_0.5.0          gtable_0.3.1           future_1.31.0         
[26] codetools_0.2-18       evaluate_0.20          mlr3data_0.6.1         knitr_1.42             fastmap_1.1.0         
[31] parallel_4.2.2         fansi_1.0.4            mlr3tuningspaces_0.3.3 renv_0.16.0            backports_1.4.1       
[36] scales_1.2.1           checkmate_2.1.0        mlr3filters_0.7.1      mlr3viz_0.6.1          mlr3tuning_0.17.2     
[41] parallelly_1.34.0      ggplot2_3.4.1          digest_0.6.31          dplyr_1.1.0            grid_4.2.2            
[46] clue_0.3-64            cli_3.6.0              tools_4.2.2            magrittr_2.0.3         tibble_3.1.8          
[51] cluster_2.1.4          mlr3misc_0.11.0        future.apply_1.10.0    pkgconfig_2.0.3        data.table_1.14.8     
[56] mlr3pipelines_0.4.2    rmarkdown_2.20         rstudioapi_0.14        lgr_0.4.4              R6_2.5.1              
[61] globals_0.16.2         compiler_4.2.2
mb706 commented 1 year ago

An mlr3pipelines PipeOp does not know about what other operations happen before / after it. In this case, it may be worth considering whether an mlr3 Learner should throw an error when it is given a task with weights but without the weights property.

berndbischl commented 2 months ago

add a unit test to pipelines before this is closed