mlr-org / mlr3tuning

Hyperparameter optimization package of the mlr3 ecosystem
https://mlr3tuning.mlr-org.com/
GNU Lesser General Public License v3.0
53 stars 5 forks source link

Number of columns for task in trafo #309

Closed MislavSag closed 2 years ago

MislavSag commented 2 years ago

Hi,

I am using transformation (trafo) method for my ranger learner. The problem is that I need the number of features (columns) in task to transform my param_set. For example:

  # random forest
  learner = lrn("classif.ranger", predict_type = "prob", predict_sets = c("train", "test"))
  learner_params = ParamSet$new(
    params = list(
      ParamInt$new("max.depth", lower = 2, upper = 10, tags = "ranger"),
      ParamDbl$new("mtry", lower = 0.1, upper = 0.9, default = 0.5)
    ))
  learner_params$trafo = function(x, param_set) { # This is slight modification (simplification) from mlr automl package
    if ("mtry" %in% names(x)) {
      proposed_mtry = as.integer(length(task$feature_names)^x[["mtry"]]) ### HREE THE PROBLEM ####
      x[["mtry"]] = max(1, proposed_mtry)
    }
    x
  }
  rf = make_auto_tuner(learner, learner_params)

On the line HERE THE PROBLEM I would like to add number of features. Now I use task$features_name, but this works only if I have one task. Is there any more clever way to set number of columns here.

be-marc commented 2 years ago

Hey, we have a working solution in https://github.com/mlr-org/paradox/pull/323. So you need to install the expression_params branch with devtools::install_github("mlr-org/paradox@expression_params", force = TRUE). It seems we are not happy with the syntax and therefore did not merge the PR. You can still use it. Your example would look like this:

library(mlr3verse)
library(paradox) # @expression_params

learner = lrn("classif.ranger", predict_type = "prob",
  max.depth = to_tune(2, 10),
  mtry = to_tune(p_dbl(0.1, 0.9, 
    trafo = function(x) {
      ContextPV(function(task) {
        max(1, round(length(task$feature_names) * x))
      }, x)
    }
  ))
)

learner$param_set$context_available = "task" 

at = auto_tuner(
  method = "random_search",
  learner = learner,
  resampling = rsmp("holdout"),
  measure = msr("classif.ce"),
  term_evals = 100)

at$train(tsk("iris"))

I try to post here again when a working solution is merged to the main branch.

MislavSag commented 2 years ago

I will to wait to be merged to the main branch, just for stability. Thanks!

be-marc commented 2 years ago

We have another solution now. You can tune mtry.ratio for classif.ranger. See https://mlr3learners.mlr-org.com/reference/mlr_learners_classif.ranger.html.