AutoTuner object training fails when using classif.svm and the selected kernel doesn't include certain hyperparameters

ZekeMarshall commented 3 weeks ago

Hi,

Thank you very much for creating such an incredible ecosystem of packages!

I have encountered an issue when training a mlr3tuning::AutoTuner object which contains a classif.svm learner, either individually, or in an ensemble. Specifically, training the AutoTuner object fails to save the tuned model hyperparameters (and therefore the model) when the selected kernel does not include certain hyperparameters. I have used the default search space for svm beginning on line 142 in the mlr3tuningspaces package See a minimal reproducible example below:

library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(mlr3tuning)
library(paradox)

# Establish task
task = as_task_classif(tsk("pima")$data(), 
                       target = "diabetes", 
                       id = "pima", 
                       positive = "pos")

# Create a list of learners
learners_l = list(
  svm = lrn("classif.svm", id = "svm"),
  gam = lrn("classif.gam", id = "gam")
)

# Create graph
graph = po("imputemean") %>>%
  gunion(learners_l) %>>%
  po("classifavg", innum = length(learners_l))

graph_learner = as_learner(graph)

# check the names of the hyperparameters in the ensemble
graph_learner$param_set

# search space
search_space <- ps(
  svm.svm.cost = p_dbl(1e-4, 1e4, logscale = TRUE),
  svm.svm.kernel = p_fct(levels = c("polynomial", "radial", "sigmoid", "linear")),
  svm.svm.degree = p_int(lower = 2, upper = 5),
  svm.svm.gamma = p_dbl(1e-4, 1e4, logscale = TRUE)
)

# tune - SUCCEEDS
tune(tuner = tnr("grid_search"),
     task = task,
     learner = graph_learner,
     resampling = rsmp("holdout"),
     measure = msr("classif.ce"),
     search_space = search_space,
     term_evals = 2,
     store_models = TRUE)

# auto tuner - FAILS
at = auto_tuner(tuner = tnr("grid_search"),
                learner = graph_learner,
                resampling = rsmp("holdout"),
                measure = msr("classif.ce"),
                search_space = search_space,
                term_evals = 2,
                store_models = TRUE)

# train
at$train(task)

First, when just using the tune function to obtain the tuned hyperparameters the search succeeds, producing the following object:

<TuningInstanceBatchSingleCrit>
* State:  Optimized
* Objective: <ObjectiveTuningBatch:imputemean.svm.svm.gam.gam.classifavg_on_pima>
* Search Space:
               id    class    lower   upper nlevels
           <char>   <char>    <num>   <num>   <num>
1:   svm.svm.cost ParamDbl -9.21034 9.21034     Inf
2: svm.svm.kernel ParamFct       NA      NA       4
3: svm.svm.degree ParamInt  2.00000 5.00000       4
4:  svm.svm.gamma ParamDbl -9.21034 9.21034     Inf
* Terminator: <TerminatorEvals>
* Result:
   svm.svm.cost svm.svm.kernel svm.svm.degree svm.svm.gamma classif.ce
          <num>         <char>          <int>         <num>      <num>
1:    -1.023371         linear              5      3.070113       0.25
* Archive:
   svm.svm.cost svm.svm.kernel svm.svm.degree svm.svm.gamma classif.ce
          <num>         <char>          <int>         <num>      <num>
1:    -1.023371         linear              5      3.070113  0.2500000
2:     3.070113         radial              5     -7.163598  0.2890625

Second, when training the at object for the first time I receive an error message similar to this issue https://github.com/mlr-org/mlr3learners/issues/208 , i.e.

Error in self$assert(xs, sanitize = TRUE) : 
  Assertion on 'xs' failed: svm.svm.cost: can only be set if the following condition is met 'svm.svm.type == C-classification'. Instead the parameter value for 'svm.svm.type' is not set at all. Try setting 'svm.svm.type' to a value that satisfies the condition.

Lastly, when training the at object a second time I receive the following message

Error in self$assert(xs, sanitize = TRUE) : 
  Assertion on 'xs' failed: svm.svm.cost: can only be set if the following condition is met 'svm.svm.type == C-classification'. Instead the parameter value for 'svm.svm.type' is not set at all. Try setting 'svm.svm.type' to a value that satisfies the condition
svm.svm.degree: can only be set if the following condition is met 'svm.svm.kernel == polynomial'. Instead the current parameter value is: svm.svm.kernel == linear
svm.svm.gamma: can only be set if the following condition is met 'svm.svm.kernel %in% {polynomial, radial, sigmoid}'. Instead the current parameter value is: svm.svm.kernel == linear.

In this case the selected kernel is "linear" which does not require the gamma and degree hyperparameters,

Any help would be greatly appreciated!

Best regards,

Zeke

Session Information

R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] mlr3mbo_0.2.4       mlr3tuning_1.0.0    paradox_1.0.1       mlr3fselect_1.0.0   mlr3filters_0.8.0   mlr3pipelines_0.6.0 mlr3learners_0.7.0  mlr3_0.20.2        

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1             dplyr_1.1.4                  farver_2.1.2                 mlr3extralearners_0.8.0-9000 fastmap_1.2.0                digest_0.6.36               
 [7] lifecycle_1.0.4              magrittr_2.0.3               compiler_4.4.0               rlang_1.1.4                  tools_4.4.0                  igraph_2.0.3                
[13] plotrix_3.8-4                utf8_1.2.4                   yaml_2.3.10                  data.table_1.15.4            knitr_1.48                   spacefillr_0.3.3            
[19] labeling_0.4.3               xgboost_1.7.8.1              pkgload_1.4.0                earth_5.3.3                  withr_3.0.1                  purrr_1.0.2                 
[25] mlr3misc_0.15.1              nnet_7.3-19                  grid_4.4.0                   fansi_1.0.6                  mlr3measures_0.6.0           e1071_1.7-14                
[31] colorspace_2.1-1             future_1.34.0                ggplot2_3.5.1                globals_0.16.3               scales_1.3.0                 iterators_1.0.14            
[37] cli_3.6.3                    rmarkdown_2.27               crayon_1.5.3                 bbotk_1.0.1                  generics_0.1.3               rstudioapi_0.16.0           
[43] future.apply_1.11.2          proxy_0.4-27                 splines_4.4.0                parallel_4.4.0               vctrs_0.6.5                  Matrix_1.7-0                
[49] jsonlite_1.8.8               Formula_1.2-5                listenv_0.9.1                foreach_1.5.2                lgr_0.4.4                    tidyr_1.3.1                 
[55] glue_1.7.0                   parallelly_1.38.0            nloptr_2.1.1                 plotmo_3.6.3                 codetools_0.2-20             gtable_0.3.5                
[61] palmerpenguins_0.1.1         munsell_0.5.1                tibble_3.2.1                 pillar_1.9.0                 htmltools_0.5.8.1            randomForest_4.7-1.1        
[67] R6_2.5.1                     evaluate_0.24.0              lattice_0.22-6               backports_1.5.0              renv_1.0.7                   class_7.3-22                
[73] Rcpp_1.0.13                  uuid_1.2-1                   FSelectorRcpp_0.3.11         nlme_3.1-164                 checkmate_2.3.2              mgcv_1.9-1                  
[79] ranger_0.16.0                xfun_0.46                    pkgconfig_2.0.3

berndbischl commented 3 weeks ago

Hi @ZekeMarshall, happy to help you out. What you posted is basically a "usage error", but some speed optimizations in the recent mlr3 have made these errors a bit more likely.... so we might go back to a more robust setting after this issue.

berndbischl commented 3 weeks ago

1) You didn't really ask about it, but still: Your params / ids look a bit "ugly" because of the "svm.svm" duplication.

That happens because you use a named list here.

learners_l = list(
  svm = lrn("classif.svm", id = "svm"),
  gam = lrn("classif.gam", id = "gam")
)

This list is passed to "gunion" to join the graphs (here: just 2 pipeops). If you name the list, the names are "prefixed" to the nodes of the graphs you join -- to avoid name clashes. But.... Your names are unique already. Use an unnamed list for simpler / nicer IDs.

berndbischl commented 3 weeks ago

Your issue itself is a combo of 2 simple things.

a) the e1071 svm is slightly annoying, it has multiple "classification" modes. You need to set "type = "C-classification". For various reasons, this is not a default in mlr3; if you don't set it, the learner errors, with an informative error message, that you need to set it --- so far, so simple. So the error you get at the end of the autotuner is actually "good" -- and easily fixable.

(for completeness sake: you can also set "nu-classification", that's actually why the "type" param exists in e1071::svm....)

be-marc commented 3 weeks ago

library(mlr3learners)

task = tsk("sonar")
learner = lrn("classif.svm", type = "C-classification", id = "svm")

# with search space
search_space = ps(
  cost = p_dbl(1e-4, 1e4, logscale = TRUE),
  kernel = p_fct(levels = c("polynomial", "radial", "sigmoid", "linear")),
  degree = p_int(lower = 2, upper = 5, depends = kernel == "polynomial"),
  gamma = p_dbl(1e-4, 1e4, logscale = TRUE)
)

tune(
  tuner = tnr("grid_search"),
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  search_space = search_space
  term_evals = 2,
  store_models = TRUE,
  check_values = TRUE)

# with tune token
learner = lrn("classif.svm", type = "C-classification", 
  cost = to_tune(1e-4, 1e4, logscale = TRUE),
  kernel = to_tune(levels = c("polynomial", "radial", "sigmoid", "linear")),
  degree = to_tune(lower = 2, upper = 5),
  gamma = to_tune(1e-4, 1e4, logscale = TRUE)
)

tune(
  tuner = tnr("grid_search"),
  task = task,
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  term_evals = 2,
  store_models = TRUE,
  check_values = TRUE)

berndbischl commented 3 weeks ago

b) in a certain sense, the real issue is, that your tuning code DOES NOT error. this happens because we removed a "safety check" here for speed reasons. I guess we will reactivate this now.

berndbischl commented 3 weeks ago

Have a look at the code that @be-marc posted. It contains multiple hints.

First of all, "check_values = TRUE" activates the safety check. You want that. At least for a "first try run" (and apparently our new defaults partially "sucks" here).

Second, Marc shows you how to set all required dependencies in the search space. If (!) you construct the search space yourself, you are responsible for "marking" subordinate params like this. Constructing search spaces yourself, fully, is the more flexible way. You can do this, I often do this myself, but there is also a "lazier" way.

Third: You can use the "tune tokens" in the learner, to implicitly create the search space in the learner. If (!) you do THAT, you can leave out certain information. That type of information which is already present in the "param_set" of the learners. That includes dependencies.

Does this help?

ZekeMarshall commented 2 weeks ago

Hi @berndbischl and @be-marc , thank you very much for your explanations and advice! Yes that has answered my query exactly, I've adapted @be-marc's examples and implemented them in my workflow and everything is working perfectly. Thanks again!

mlr-org / mlr3tuning

AutoTuner object training fails when using classif.svm and the selected kernel doesn't include certain hyperparameters #442