Closed nipnipj closed 1 month ago
The way to do this would be with an "extra_trafo
" in the search space. You can either define it explicitly for a given search_space, or use to_tune()
with a ParamSet
that has an extra_trafo
that results in one dimension. The extra_trafo
can create any kind of object, not necessarily a scalar, that is then assigned to the hyperparameter in question.
Say we have the "mtcars"
-Task with features am
and carb
, among others. A TuneToken
that searches over two dimensions and results in something that could set the mutation
hyperparameter could look as follows:
library("paradox")
tt <- to_tune(ps(
am.dg = p_int(1, 3),
carb.dg = p_int(1, 3),
.extra_trafo = function(x) {
list(
# the following is what `mutation` will ultimately be set to
output = list(
am = ~ am ^ x$am.dg,
carb = ~ carb ^ x$carb.dg
)
)
}
))
It is important that .extra_trafo
returns a named list with one element here, but the name of that element is ignored.
(I am using exponentiation instead of splines here because you specifically asked about PipeOpMutate, which can only generate single columns. To create splines, you could use PipeOpModelMatrix instead.)
We can now build the following pipeline:
glrn <- po("mutate", id = "mutate1", mutation = tt) %>>% lrn("regr.lm")
The search space for this pipeline now looks like this:
glrn$param_set$search_space()
#> <ParamSet(2)>
#> id class lower upper nlevels default value
#> <char> <char> <num> <num> <num> <list> <list>
#> 1: am.dg ParamInt 1 3 3 <NoDefault[0]>
#> 2: carb.dg ParamInt 1 3 3 <NoDefault[0]>
#> Trafo is set.
and it creates the following kinds of samples
generate_design_random(glrn$param_set$search_space(), 1)$transpose()[[1]]
#> $mutate1.mutation
#> $mutate1.mutation$am
#> ~am^x$am.dg
#> <environment: 0x55d1d3407610>
#>
#> $mutate1.mutation$carb
#> ~carb^x$carb.dg
#> <environment: 0x55d1d3407610>
This is the value that mutate1.mutation
would be set to during optimization: The mutation that happens is determined by the formula, and the specific values of carb.dg
and am.dg
are stored inside the attached "environment
", which gets created (implicitly) in the extra_trafo
call.
Note that we could also have set other hyperparameters in the pipeline to TuneToken
and the search_space()
would have been augmented appropriately.
Tuning with this with mlr3tuning
:
library("mlr3tuning")
tr <- tune(tnr("grid_search"), tsk("mtcars"), glrn, rsmp("cv"))
tr
#> <TuningInstanceBatchSingleCrit>
#> * State: Optimized
#> * Objective: <ObjectiveTuningBatch:mutate1.regr.lm_on_mtcars>
#> * Search Space:
#> id class lower upper nlevels
#> <char> <char> <num> <num> <num>
#> 1: am.dg ParamInt 1 3 3
#> 2: carb.dg ParamInt 1 3 3
#> * Terminator: <TerminatorNone>
#> * Result:
#> am.dg carb.dg regr.mse
#> <int> <int> <num>
#> 1: 2 1 12.17455
#> * Archive:
#> am.dg carb.dg regr.mse
#> <int> <int> <num>
#> 1: 1 3 12.38879
#> 2: 2 3 12.38879
#> 3: 3 3 12.38879
#> 4: 2 1 12.17455
#> 5: 2 2 12.47607
#> 6: 1 2 12.47607
#> 7: 3 1 12.17455
#> 8: 3 2 12.47607
#> 9: 1 1 12.17455
As we can see, the result has am.dg
set to 2 and carb.dg
set to 1. We can also see the specific hyperparameter value that was set:
tr$result$x_domain
#> [[1]]
#> [[1]]$mutate1.mutation
#> [[1]]$mutate1.mutation$am
#> ~am^x$am.dg
#> <environment: 0x55d1d0ca20c0>
#>
#> [[1]]$mutate1.mutation$carb
#> ~carb^x$carb.dg
#> <environment: 0x55d1d0ca20c0>
The values of carb.dg
and am.dg
are hidden inside the environment
of these formulae:
tr$result$x_domain[[1]]$mutate1.mutation$am
#> ~am^x$am.dg
#> <environment: 0x55d1d0ca20c0>
environment(tr$result$x_domain[[1]]$mutate1.mutation$am) |> as.list()
#> $x
#> $x$am.dg
#> [1] 2
#>
#> $x$carb.dg
#> [1] 1
We can see what these hyperparameters do to the task by assigning them to the glrn
and using the PipeOp:
glrn$param_set$set_values(.values = tr$result_learner_param_vals)
dummy <- as_task_regr(data.frame(am = 2, carb = 2, target = 1), target = "target")
mutated <- glrn$pipeops$mutate1$train(list(dummy))[[1]]
mutated$data()
#> target am carb
#> <num> <num> <num>
#> 1: 1 4 2
Here, "am
" was squared, while "carb
" was not.
(Sorry for the late reply; you probably don't have this problem any more, but it may help others searching the archives.)
Quick question. Are parameters from functions used in "mutate pipeop" tuneable? For example,
df
parameter insplines::ns()
.Is it possible to add
df
parameter toParamSet$new()
?