mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
931 stars 85 forks source link

Seems default hyperparameter values are not taken into account when tuning some learner hyperparameters #731

Closed cmaudoux closed 2 years ago

cmaudoux commented 2 years ago

Hi,

I am a newbie with R and mlr3. I tried to implement the J48 algoritm with mlr3 and I am facing an issue when I want to set some hyperparameters.

My R script below:

> install.packages("devtools", dependencies=TRUE)
> library(devtools)
> library("mlr3verse")
> packageVersion("mlr3")
[1] ‘0.12.0’
> packageVersion("mlr3verse")
[1] ‘0.2.2’
> learner = lrn("classif.J48", C=0.35)
Error in self$assert(xs) : 
  Assertion on 'xs' failed: The parameter 'C' can only be set if the following condition is met 'U = FALSE'. Instead the parameter value for 'U' is not set at all. Try setting 'U' to a value that satisfies the condition.
> learner = lrn("classif.J48", U=FALSE, R=FALSE, C=0.35, output_debug_info=TRUE)
> learner$param_set
<ParamSet>
                                id    class        lower upper nlevels        default parents value
 1:                              A ParamLgl           NA    NA       2          FALSE              
 2:                              B ParamLgl           NA    NA       2          FALSE              
 3:                              C ParamDbl 2.220446e-16     1     Inf           0.25     U,R  0.35
 4:                              J ParamLgl           NA    NA       2          FALSE              
 5:                              L ParamLgl           NA    NA       2          FALSE              
 6:                              M ParamInt 1.000000e+00   Inf     Inf              2              
 7:                              N ParamInt 2.000000e+00   Inf     Inf              3     U,R      
 8:                              O ParamLgl           NA    NA       2          FALSE              
 9:                              Q ParamInt 1.000000e+00   Inf     Inf              1              
10:                              R ParamLgl           NA    NA       2          FALSE       U FALSE
11:                              S ParamLgl           NA    NA       2          FALSE              
12:                              U ParamLgl           NA    NA       2          FALSE         FALSE
13:                     batch_size ParamInt 1.000000e+00   Inf     Inf            100              
14: doNotMakeSplitPointActualValue ParamLgl           NA    NA       2          FALSE              
15:      do_not_check_capabilities ParamLgl           NA    NA       2          FALSE              
16:                      na.action ParamUty           NA    NA     Inf <NoDefault[3]>              
17:             num_decimal_places ParamInt 1.000000e+00   Inf     Inf              2              
18:                        options ParamUty           NA    NA     Inf                             
19:              output_debug_info ParamLgl           NA    NA       2          FALSE          TRUE
20:                         subset ParamUty           NA    NA     Inf <NoDefault[3]>              
> resampling = rsmp("repeated_cv")
> S42=read.table("/home/maudoux/Documents/CNAM/These/2Year/CTU13/S42.csv", header = TRUE, sep = ",")
> S42=subset(S42, select = -c(Id))
> S42
   Proto PSH ACK RST SYN FIN Dport TotReq       TotDur TotPkts TotBytes TotSrcBytes   Bot
1    udp   0   0   0   0   0 13363      2     0.002116       4     1175         155 false
2    udp   0   0   0   0   0 13363      3  5013.771057      20     5227        4198 false
3    tcp   2   2   0   2   2   443      2     3.870743      34     2725        1378 false
4    udp   0   0   0   0   0 13363      4 11920.672850      36     3658        1360 false
5    udp   0   0   0   0   0 13363      1     0.003899       2      525         465 false
6    udp   0   0   0   0   0  5789      1     0.026085       2      134          74 false
7    udp   0   0   0   0   0 13363      3     0.002938       6     1680         228 false
8    udp   0   0   0   0   0 13363      1     0.000986       2      138          78 false
9    tcp   0   0   0   1   0 51447      1     0.604177       4      252         132 false
.......
 [ reached 'max' / getOption("max.print") -- omitted 695462 rows ]
> fields = c('ACK','Dport','FIN','PSH','RST','SYN','TotPkts','TotReq')
> S42[fields] <- lapply(S42[fields], as.numeric)
> task_S42 = as_task_classif(S42, target = "Bot", positive = "true")
> print(task_S42)
<TaskClassif:S42> (695538 x 13)
* Target: Bot
* Properties: twoclass
* Features (12):
  - dbl (11): ACK, Dport, FIN, PSH, RST, SYN, TotBytes, TotDur, TotPkts, TotReq, TotSrcBytes
  - fct (1): Proto
> task_S42$feature_names
 [1] "ACK"         "Dport"       "FIN"         "PSH"         "Proto"       "RST"         "SYN"         "TotBytes"    "TotDur"     
[10] "TotPkts"     "TotReq"      "TotSrcBytes"
> rr42=resample(task_S42, learner, resampling, store_models = TRUE)
INFO  [21:01:53.136] [mlr3]  Applying learner 'classif.J48' on task 'S42' (iter 42/100) 
Error in RWeka::Weka_control(ctrl) : All arguments must be named.

Without tuning the learner, all is OK

> learner = lrn("classif.J48")
> learner$param_set
<ParamSet>
                                id    class        lower upper nlevels        default parents value
 1:                              A ParamLgl           NA    NA       2          FALSE              
 2:                              B ParamLgl           NA    NA       2          FALSE              
 3:                              C ParamDbl 2.220446e-16     1     Inf           0.25     U,R      
 4:                              J ParamLgl           NA    NA       2          FALSE              
 5:                              L ParamLgl           NA    NA       2          FALSE              
 6:                              M ParamInt 1.000000e+00   Inf     Inf              2              
 7:                              N ParamInt 2.000000e+00   Inf     Inf              3     U,R      
 8:                              O ParamLgl           NA    NA       2          FALSE              
 9:                              Q ParamInt 1.000000e+00   Inf     Inf              1              
10:                              R ParamLgl           NA    NA       2          FALSE       U      
11:                              S ParamLgl           NA    NA       2          FALSE              
12:                              U ParamLgl           NA    NA       2          FALSE              
13:                     batch_size ParamInt 1.000000e+00   Inf     Inf            100              
14: doNotMakeSplitPointActualValue ParamLgl           NA    NA       2          FALSE              
15:      do_not_check_capabilities ParamLgl           NA    NA       2          FALSE              
16:                      na.action ParamUty           NA    NA     Inf <NoDefault[3]>              
17:             num_decimal_places ParamInt 1.000000e+00   Inf     Inf              2              
18:                        options ParamUty           NA    NA     Inf                             
19:              output_debug_info ParamLgl           NA    NA       2          FALSE              
20:                         subset ParamUty           NA    NA     Inf <NoDefault[3]>              
> rr42=resample(task_S42, learner, resampling, store_models = TRUE)

Like explained in mlr3book, I tried other method but same issue occurs:

> learner = lrn("classif.J48")
> pv = learner$param_set$values
> pv
named list()
> pv$C=0.35
> pv
$C
[1] 0.35

> learner$param_set$values = pv
Error in self$assert(xs) : 
  Assertion on 'xs' failed: The parameter 'C' can only be set if the following condition is met 'U = FALSE'. Instead the parameter value for 'U' is not set at all. Try setting 'U' to a value that satisfies the condition.
> pv$U=FALSE
> pv$R=FALSE
> pv
$C
[1] 0.35

$U
[1] FALSE

$R
[1] FALSE

> learner$param_set$values = pv
> rr42=resample(task_S42, learner, resampling, store_models = TRUE)
INFO  [21:19:08.301] [mlr3]  Applying learner 'classif.J48' on task 'S42' (iter 97/100) 
Error in RWeka::Weka_control(ctrl) : All arguments must be named.

Thanks for you help

Cheers

be-marc commented 2 years ago

Thanks for reporting. Fixed in the latest mlr3extralearners version.