mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 405 forks source link

regr.randomForestSRC: parameter importance can't be set to "TRUE" #2345

Closed mareichhoff closed 6 years ago

mareichhoff commented 6 years ago

Hello Lars, Bernd and all the others,

I have the problem that importance = "TRUE" is not being accepted when I like to tune. By the way... what is the difference between "FALSE" and "none"? FALSE is not explained in the randomForestSRC documentation. Some weeks before it worked with importance="TRUE" (perhaps you changed something when updating the package).

Here is my "quite minimal" example and the session information afterwards:

library(mlr)
library(mlbench)
data("BostonHousing")

task <- bh.task 
measures <- mse

rdesc.inner <- makeResampleDesc("CV", iters = 2L)
rdesc.outer <- makeResampleDesc("CV", iters = 2L)

train.ids <- sample(500L, 333L)
test.ids <- setdiff(1:500,train.ids)

train <- BostonHousing[train.ids,]
test <- BostonHousing[test.ids,]

regr.task.nested <- makeRegrTask(id="bh.nested", data=train, target="medv")

ps = makeParamSet(makeDiscreteParam(id="ntree", 500L),
                  makeDiscreteParam(id="mtry", 3L),
                  makeDiscreteParam(id="nodesize", 3L),
                  makeDiscreteParam(id="importance", "TRUE"))

tune.ctrl <- makeTuneControlGrid()

wrap.tune.rf <- makeTuneWrapper(learner = "regr.randomForestSRC", 
                                resampling = rdesc.inner, 
                                measures = measures, 
                                par.set = ps, 
                                control = tune.ctrl)

bench <- benchmark(learners=list(wrap.tune.rf), 
                   tasks=regr.task.nested, 
                   resamplings=rdesc.outer, 
                   measures=measures, 
                   keep.pred=TRUE,
                   models=TRUE)

getLearnerParamSet("regr.randomForestSRC")

sessionInfo()

Output:

Task: bh.nested, Learner: regr.randomForestSRC.tuned
Resampling: cross-validation
Measures:             mse    
[Tune] Started tuning learner regr.randomForestSRC for parameter set:
               Type len Def Constr Req Tunable Trafo
ntree      discrete   -   -    500   -    TRUE     -
mtry       discrete   -   -      3   -    TRUE     -
nodesize   discrete   -   -      3   -    TRUE     -
importance discrete   -   -   TRUE   -    TRUE     -
With control class: TuneControlGrid
Imputation value: Inf
[Tune-x] Setting hyperpars failed: Error in setHyperPars2.Learner(learner, insert(par.vals, args)) : 
  TRUE is not feasible for parameter 'importance'!

[Tune-x] 1: ntree=500; mtry=3; nodesize=3; importance=TRUE
[Tune-y] 1: mse.test.mean=   NA; time: 0.0 min
[Tune] Result: ntree=500; mtry=3; nodesize=3; importance=TRUE : mse.test.mean=   NA
Error in setHyperPars2.Learner(learner, insert(par.vals, args)) : 
  TRUE is not feasible for parameter 'importance'!
R> 
R> getLearnerParamSet("regr.randomForestSRC")
                     Type  len       Def                                   Constr Req Tunable Trafo
ntree             integer    -      1000                                 1 to Inf   -    TRUE     -
bootstrap        discrete    -   by.root                     by.root,by.node,none   -    TRUE     -
mtry              integer    -         -                                 1 to Inf   -    TRUE     -
nodesize          integer    -         5                                 1 to Inf   -    TRUE     -
nodedepth         integer    -        -1                              -Inf to Inf   -    TRUE     -
splitrule        discrete    -       mse             mse,mse.unwt,mse.hvwt,random   -    TRUE     -
nsplit            integer    -         0                                 0 to Inf   Y    TRUE     -
split.null        logical    -     FALSE                                        -   -    TRUE     -
importance       discrete    -     FALSE FALSE,TRUE,none,permute,random,anti,p...   -   FALSE     -
na.action        discrete    - na.impute                        na.omit,na.impute   -    TRUE     -
nimpute           integer    -         1                                 1 to Inf   -    TRUE     -
proximity        discrete    -     FALSE                 inbag,oob,all,TRUE,FALSE   -   FALSE     -
sampsize          integer    -         -                                 1 to Inf   Y    TRUE     -
samptype         discrete    -       swr                                 swr,swor   Y    TRUE     -
xvar.wt     numericvector <NA>         -                                 0 to Inf   -    TRUE     -
forest            logical    -      TRUE                                        -   -   FALSE     -
var.used         discrete    -     FALSE                  FALSE,all.trees,by.tree   -   FALSE     -
split.depth      discrete    -     FALSE                  FALSE,all.trees,by.tree   -   FALSE     -
seed              integer    -         -                                -Inf to 0   -   FALSE     -
do.trace          logical    -     FALSE                                        -   -   FALSE     -
membership        logical    -      TRUE                                        -   -   FALSE     -
statistics        logical    -     FALSE                                        -   -   FALSE     -
tree.err          logical    -     FALSE                                        -   -   FALSE     -
R> 
R> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.3.0     stringr_1.3.1     dplyr_0.7.5       purrr_0.2.5       readr_1.1.1       tidyr_0.8.1       tibble_1.4.2      ggplot2_2.2.1    
 [9] tidyverse_1.2.1   data.table_1.11.4 FSelector_0.31    BBmisc_1.11       mlbench_2.1-1     parallelMap_1.4   mlr_2.12.1        ParamHelpers_1.11

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17          lubridate_1.7.4       lattice_0.20-35       RWeka_0.4-38          assertthat_0.2.0      digest_0.6.15        
 [7] psych_1.8.4           R6_2.2.2              cellranger_1.1.0      plyr_1.8.4            backports_1.1.2       httr_1.3.1           
[13] pillar_1.2.3          RWekajars_3.9.2-1     rlang_0.2.1           lazyeval_0.2.1        readxl_1.1.0          rstudioapi_0.7       
[19] Matrix_1.2-14         checkmate_1.8.5       splines_3.5.0         foreign_0.8-70        munsell_0.5.0         broom_0.4.4          
[25] compiler_3.5.0        modelr_0.1.2          pkgconfig_2.0.1       randomForestSRC_2.6.1 mnormt_1.5-5          tidyselect_0.2.4     
[31] randomForest_4.6-14   XML_3.98-1.11         crayon_1.3.4          grid_3.5.0            nlme_3.1-137          jsonlite_1.5         
[37] gtable_0.2.0          magrittr_1.5          scales_0.5.0          cli_1.0.0             stringi_1.2.3         reshape2_1.4.3       
[43] bindrcpp_0.2.2        xml2_1.2.0            fastmatch_1.1-0       tools_3.5.0           entropy_1.2.1         glue_1.2.0           
[49] hms_0.4.2             parallel_3.5.0        survival_2.42-3       colorspace_1.3-2      rvest_0.3.2           rJava_0.9-10         
[55] bindr_0.1.1           haven_1.1.2  

Thank you very much for your help!

mareichhoff commented 6 years ago

That's the reason why I wanted to use the development version because I thought that you might corrected the error there already....

mareichhoff commented 6 years ago

By the way, when I set importance = "FALSE" an error message appears, that FALSE is not a feasible parameter setting, although it should be allowed. "none" works. Perhaps "FALSE" should be deleted according to the rfsrc function parameters from the original package?

pat-s commented 6 years ago

importance is not tunable so you cannot set in in the param set:

getParamSet("regr.randomForestSRC")
                     Type  len       Def                                   Constr Req Tunable Trafo
ntree             integer    -      1000                                 1 to Inf   -    TRUE     -
bootstrap        discrete    -   by.root                     by.root,by.node,none   -    TRUE     -
mtry              integer    -         -                                 1 to Inf   -    TRUE     -
nodesize          integer    -         5                                 1 to Inf   -    TRUE     -
nodedepth         integer    -        -1                              -Inf to Inf   -    TRUE     -
splitrule        discrete    -       mse             mse,mse.unwt,mse.hvwt,random   -    TRUE     -
nsplit            integer    -         0                                 0 to Inf   Y    TRUE     -
split.null        logical    -     FALSE                                        -   -    TRUE     -
importance       discrete    -     FALSE FALSE,TRUE,none,permute,random,anti,p...   -   FALSE     -
na.action        discrete    - na.impute                        na.omit,na.impute   -    TRUE     -
nimpute           integer    -         1                                 1 to Inf   -    TRUE     -
proximity        discrete    -     FALSE                 inbag,oob,all,TRUE,FALSE   -   FALSE     -
sampsize          integer    -         -                                 1 to Inf   Y    TRUE     -
samptype         discrete    -       swr                                 swr,swor   Y    TRUE     -
xvar.wt     numericvector <NA>         -                                 0 to Inf   -    TRUE     -
forest            logical    -      TRUE                                        -   -   FALSE     -
var.used         discrete    -     FALSE                  FALSE,all.trees,by.tree   -   FALSE     -
split.depth      discrete    -     FALSE                  FALSE,all.trees,by.tree   -   FALSE     -
seed              integer    -         -                                -Inf to 0   -   FALSE     -
do.trace          logical    -     FALSE                                        -   -   FALSE     -
membership        logical    -      TRUE                                        -   -   FALSE     -
statistics        logical    -     FALSE                                        -   -   FALSE     -
tree.err          logical    -     FALSE                                        -   -   FALSE     -

Instead, you need to declare it in the makeLearner() call and pass the created object onto makeTuneWrapper(). See the example below. Usage questions like this do usually better fit to stackoverflow than Github. Happy to help :)

library(mlr)
#> Loading required package: ParamHelpers
library(mlbench)
data("BostonHousing")

task <- bh.task 
measures <- mse

rdesc.inner <- makeResampleDesc("CV", iters = 2L)
rdesc.outer <- makeResampleDesc("CV", iters = 2L)

train.ids <- sample(500L, 333L)
test.ids <- setdiff(1:500,train.ids)

train <- BostonHousing[train.ids,]
test <- BostonHousing[test.ids,]

regr.task.nested <- makeRegrTask(id="bh.nested", data=train, target="medv")

ps = makeParamSet(makeDiscreteParam(id="ntree", 500L),
                  makeDiscreteParam(id="mtry", 3L),
                  makeDiscreteParam(id="nodesize", 3L))

lrn = makeLearner("regr.randomForestSRC", importance = TRUE)

tune.ctrl <- makeTuneControlGrid()

wrap.tune.rf <- makeTuneWrapper(learner = lrn, 
                                resampling = rdesc.inner, 
                                measures = measures, 
                                par.set = ps, 
                                control = tune.ctrl)

bench <- benchmark(learners=list(wrap.tune.rf), 
                   tasks=regr.task.nested, 
                   resamplings=rdesc.outer, 
                   measures=measures, 
                   keep.pred=TRUE,
                   models=TRUE)
#> Task: bh.nested, Learner: regr.randomForestSRC.tuned
#> Resampling: cross-validation
#> Measures:             mse
#> [Tune] Started tuning learner regr.randomForestSRC for parameter set:
#>              Type len Def Constr Req Tunable Trafo
#> ntree    discrete   -   -    500   -    TRUE     -
#> mtry     discrete   -   -      3   -    TRUE     -
#> nodesize discrete   -   -      3   -    TRUE     -
#> With control class: TuneControlGrid
#> Imputation value: Inf
#> [Tune-x] 1: ntree=500; mtry=3; nodesize=3
#> [Tune-y] 1: mse.test.mean=24.3093263; time: 0.0 min
#> [Tune] Result: ntree=500; mtry=3; nodesize=3 : mse.test.mean=24.3093263
#> [Resample] iter 1:    13.9461887
#> [Tune] Started tuning learner regr.randomForestSRC for parameter set:
#>              Type len Def Constr Req Tunable Trafo
#> ntree    discrete   -   -    500   -    TRUE     -
#> mtry     discrete   -   -      3   -    TRUE     -
#> nodesize discrete   -   -      3   -    TRUE     -
#> With control class: TuneControlGrid
#> Imputation value: Inf
#> [Tune-x] 1: ntree=500; mtry=3; nodesize=3
#> [Tune-y] 1: mse.test.mean=17.7706289; time: 0.0 min
#> [Tune] Result: ntree=500; mtry=3; nodesize=3 : mse.test.mean=17.7706289
#> [Resample] iter 2:    19.9521484
#> 
#> Aggregated Result: mse.test.mean=16.9491685
#> 

Created on 2018-06-29 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
mareichhoff commented 6 years ago

Moment Patrick! Thank you very much for your answer, but I think anyway I am right here. It's clear that the parameter is not tunable. But I still can set them fixed in the makeParamSet. If I use the following, then it works:

ps = makeParamSet(makeDiscreteParam(id="ntree", c(500,1000)),
                  makeDiscreteParam(id="mtry", c(3L,4L)),
                  makeDiscreteParam(id="nodesize", 3:5),
                  makeDiscreteParam(id="importance", "none"),
                  makeLogicalParam(id="do.trace", TRUE),
                  makeLogicalParam(id="membership", TRUE),
                  makeLogicalParam(id="statistics", TRUE),
                  makeLogicalParam(id="tree.err", TRUE))

Then this shouldn't wok as well! But it works fine! So the question stays: Why is importance set to "TRUE" in makeParamSet not possible, but "none" is possible.

What is the difference between "none" and "FALSE"?

mareichhoff commented 6 years ago

O.k., I added in my original data some lines (membership etc.), but that doesn't matter.

pat-s commented 6 years ago

Then this shouldn't wok as well! But it works fine! So the question stays: Why is importance set to "TRUE" in makeParamSet not possible, but "none" is possible.

Idk this but I assume its due to the way the arguments are parsed. Maybe this is indeed a bug but I am not sure here. One of the two behaviors should def not happen: Either all argument options should be specifiable or none.

https://github.com/mlr-org/mlr/blob/c06af216e18bceb913959a8ff4217cbc75ae546d/R/RLearner_regr_randomForestSRC.R#L18-L20

Practical advice: I would only pass tunable arguments to the param set and set fixed ones in the learner.

What is the difference between "none" and "FALSE"?

Idk, isn't this document in the help pages of the package?

mareichhoff commented 6 years ago

Not, it's not documented! I looked there. I just find. There exist "none" and "TRUE" and some others. But "FALSE" is not explained. That's why it's strange...

Strange. I opened a new issue, because this has been closed. Thank you for your help!