mlr-org / ParamHelpers

Helpers for parameters in black-box optimization, tuning and machine learning.
https://paramhelpers.mlr-org.com
Other
26 stars 9 forks source link

generateDesign does not visit all factor levels #95

Open danielhorn opened 8 years ago

danielhorn commented 8 years ago
library(ParamHelpers)

ps = makeParamSet(
  makeDiscreteParam("selected.learner", values = c("a", "b", "c")),

  makeIntegerParam("classif.randomForest.mtry", lower = 1L, upper = 1L,
                   requires = quote(selected.learner == "a")),
  # Same holds for nodesize: lower is allways 1L, upper ist 50% of n (nb. ob observations)
  makeIntegerParam("classif.randomForest.nodesize", lower = 1L, upper = 1L,
                   requires = quote(selected.learner == "a")),

  makeNumericParam("classif.svm.cost", lower = -15L, upper = 15L, trafo = function(x) 2^x,
                   requires = quote(selected.learner == "b")),
  makeDiscreteParam("classif.svm.kernel", values = c("linear", "radial"),
                    requires = quote(selected.learner == "b")),
  makeNumericParam("classif.svm.gamma", lower = -15L, upper = 15L, trafo = function(x) 2^x,
                   requires = quote(selected.learner == "b" & classif.svm.kernel == "radial")),

  makeIntegerParam("classif.knn.k", lower = 1L, upper = 21L,
                   requires = quote(selected.learner == "c"))
)

set.seed(179)
des = generateDesign(20L, ps)
table(des$classif.svm.kernel)

This is bad ... the factor level radial is never visited, while factor leven linear is 9 times. Should not happen?

berndbischl commented 8 years ago
linear radial 
     9      0 
> des
    selected.learner classif.randomForest.mtry classif.randomForest.nodesize classif.svm.cost classif.svm.kernel classif.svm.gamma classif.knn.k
1                  a                         1                             1               NA               <NA>                NA            NA
4                  c                        NA                            NA               NA               <NA>                NA            17
5                  b                        NA                            NA            4.035             linear                NA            NA
6                  b                        NA                            NA           -5.020             linear                NA            NA
7                  b                        NA                            NA            5.634             linear                NA            NA
9                  c                        NA                            NA               NA               <NA>                NA             3
11                 c                        NA                            NA               NA               <NA>                NA             1
12                 b                        NA                            NA            7.371             linear                NA            NA
13                 c                        NA                            NA               NA               <NA>                NA            21
15                 b                        NA                            NA           -9.070             linear                NA            NA
16                 c                        NA                            NA               NA               <NA>                NA            16
17                 b                        NA                            NA            9.809             linear                NA            NA
19                 c                        NA                            NA               NA               <NA>                NA             6
20                 c                        NA                            NA               NA               <NA>                NA            13
171                c                        NA                            NA               NA               <NA>                NA             7
18                 b                        NA                            NA           -7.494             linear                NA            NA
191                b                        NA                            NA          -13.941             linear                NA            NA
181                c                        NA                            NA               NA               <NA>                NA             2
201                b                        NA                            NA          -14.846             linear                NA            NA
202                c                        NA                            NA               NA               <NA>                NA             5
> 
berndbischl commented 8 years ago

Evaluation during hangout call:

1) that we only have 1 row for sel.learner = "a" is OK and good. as the lower/upper limits only allow one value for its param (weirdly), the duplicated "a" rows are removed and then filled up with other design points.

2) that we only have kernel = linear is "bad luck". and happens because the algo to gen the design is suboptimal. we generate the LHS, then set invalid subordinate param values to NAs. apparently that happens for kernel = rbf a lot more oftem here....