Closed vera-karlbauer closed 2 months ago
This should be solved by https://github.com/mlr-org/mlr3pipelines/issues/603, waiting for the PR to be merged.
@mllg This sounds somewhat important but I haven't found any linked PR - only the issue. Has there ever been a PR or did you mean to write "issue"?
This appears to be fixed in the sense that the code does not error anymore:
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
# define task
task = tsk("sonar")
## Pipeline 1: featureless
# define learner
learner_featureless = po("learner", lrn("classif.featureless"))
# define graph
graph_featureless = learner_featureless
# convert graph to GraphLearner
learner_graph_featureless = as_learner(graph_featureless)
## Pipeline 2: LASSO
# define learner
learner_lasso = po("learner", lrn("classif.cv_glmnet", alpha = 1, id = "lasso"))
# define graph
graph_lasso = learner_lasso
# convert graph to GraphLearner
learner_graph_lasso = as_learner(graph_lasso)
## Pipeline 3: Ridge
# define learner
learner_ridge = po("learner", lrn("classif.cv_glmnet", alpha = 0, id = "ridge"))
# define graph
graph_ridge = learner_ridge
# convert graph to GraphLearner
learner_graph_ridge = as_learner(graph_ridge)
# define resampling
resampling = rsmp("cv", folds = 3)
# define benchmark object
design = benchmark_grid(
tasks = task,
learners = c(learner_graph_featureless, learner_graph_lasso, learner_graph_ridge),
resamplings = resampling
)
# instantiate learners and resampling
bmr <- benchmark(design, store_models = TRUE)
#> INFO [12:00:51.352] [mlr3] Running benchmark with 9 resampling iterations
#> INFO [12:00:51.398] [mlr3] Applying learner 'classif.featureless' on task 'sonar' (iter 1/3)
#> INFO [12:00:51.427] [mlr3] Applying learner 'classif.featureless' on task 'sonar' (iter 2/3)
#> INFO [12:00:51.448] [mlr3] Applying learner 'classif.featureless' on task 'sonar' (iter 3/3)
#> INFO [12:00:51.469] [mlr3] Applying learner 'lasso' on task 'sonar' (iter 1/3)
#> INFO [12:00:52.269] [mlr3] Applying learner 'lasso' on task 'sonar' (iter 2/3)
#> INFO [12:00:52.672] [mlr3] Applying learner 'lasso' on task 'sonar' (iter 3/3)
#> INFO [12:00:52.940] [mlr3] Applying learner 'ridge' on task 'sonar' (iter 1/3)
#> INFO [12:00:53.052] [mlr3] Applying learner 'ridge' on task 'sonar' (iter 2/3)
#> INFO [12:00:53.158] [mlr3] Applying learner 'ridge' on task 'sonar' (iter 3/3)
#> INFO [12:00:53.272] [mlr3] Finished benchmark
print(bmr)
#> <BenchmarkResult> of 9 rows with 3 resampling runs
#> nr task_id learner_id resampling_id iters warnings errors
#> 1 sonar classif.featureless cv 3 0 0
#> 2 sonar lasso cv 3 0 0
#> 3 sonar ridge cv 3 0 0
# get classification error
bmr$aggregate(measures = msr("classif.ce"))
#> nr task_id learner_id resampling_id iters classif.ce
#> <int> <char> <char> <char> <int> <num>
#> 1: 1 sonar classif.featureless cv 3 0.4663906
#> 2: 2 sonar lasso cv 3 0.2692892
#> 3: 3 sonar ridge cv 3 0.2354727
#> Hidden columns: resample_result
# get aic
bmr$aggregate(measures = msr("aic"))
#> nr task_id learner_id resampling_id iters aic
#> <int> <char> <char> <char> <int> <num>
#> 1: 1 sonar classif.featureless cv 3 NA
#> 2: 2 sonar lasso cv 3 NA
#> 3: 3 sonar ridge cv 3 NA
#> Hidden columns: resample_result
Created on 2024-08-16 with reprex v2.1.1
AIC in this context can not be calculated because AIC checks if the learner provides a loglik
:
if ("loglik" %nin% learner$properties) {
return(NA_real_)
}
which appears to be false for all the learners in this example. Using plain logistic regression, we at least get an AIC value:
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
# define task
task = tsk("sonar")
learner_logreg = po("learner", lrn("classif.log_reg", id = "logreg"))
graph_logreg = learner_logreg
graph_logreg = as_learner(graph_logreg)
# define resampling
resampling = rsmp("cv", folds = 3)
rr = resample(
task = task,
learner = graph_logreg,
resampling = resampling,
store_models = TRUE
)
#> INFO [12:12:05.195] [mlr3] Applying learner 'logreg' on task 'sonar' (iter 1/3)
#> INFO [12:12:05.331] [mlr3] Applying learner 'logreg' on task 'sonar' (iter 2/3)
#> INFO [12:12:05.386] [mlr3] Applying learner 'logreg' on task 'sonar' (iter 3/3)
#> Warning: glm.fit: algorithm did not converge
#> This happened PipeOp logreg's $train()
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> This happened PipeOp logreg's $train()
#> Warning: glm.fit: algorithm did not converge
#> This happened PipeOp logreg's $train()
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> This happened PipeOp logreg's $train()
#> Warning: glm.fit: algorithm did not converge
#> This happened PipeOp logreg's $train()
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> This happened PipeOp logreg's $train()
print(rr)
#> <ResampleResult> with 3 resampling iterations
#> task_id learner_id resampling_id iteration warnings errors
#> sonar logreg cv 1 0 0
#> sonar logreg cv 2 0 0
#> sonar logreg cv 3 0 0
# get classification error
rr$aggregate(measures = msr("classif.ce"))
#> classif.ce
#> 0.279089
# get aic
rr$aggregate(measures = msr("aic"))
#> aic
#> 122
Created on 2024-08-16 with reprex v2.1.1
I get an error message when trying to extract the AIC from the benchmark results for a classification task. The same issue occurs when trying to get the BIC, but other measures like accuracy or CE work fine. Find my reprex below: (Note about the code: I have my learners wrapped as GraphLearners because I'm using pipelines with several preprocessing steps in my actual work.)
Created on 2022-04-10 by the reprex package (v2.0.1)