Closed zzawadz closed 5 years ago
I think that for parameters the most straightforward way it would to traverse the entire graphs and gather the parameters, and add to their names the id of the node. So for example the scale
parameter in the scaler
node becomes scaler:scale
. Then the user will be able to set all parameters' values passing standard list("scaler:scale" = value, ...)
like in the case of other learners. Then the trainGraphs
will be responsible for setting the proper values for specific nodes.
See the example below (it only creates the new parameters list):
op1 = PipeOpScaler$new("myscaler")
op2 = PipeOpPCA$new()
op1$set_next(list(op2))
lrn = mlr_learners$get("classif.rpart")
op3 = PipeOpLearner$new(learner = lrn)
op2$set_next(list(op3))
pipeline_gather_params(op1)
# ParamSet: parset
# Parameters:
# myscaler:center [logical] (Default: TRUE)
# myscaler:scale [logical] (Default: TRUE)
# classif.rpart:minsplit [integer] (Default: 20): {1, ..., Inf}
# classif.rpart:cp [numeric] (Default: 0.01): [0, 1]
# classif.rpart:maxcompete [integer] (Default: 4): {0, ..., Inf}
# classif.rpart:maxsurrogate [integer] (Default: 5): {0, ..., Inf}
# classif.rpart:maxdepth [integer] (Default: 30): {1, ..., 30}
# classif.rpart:xval [integer] (Default: 10): {0, ..., Inf}
Thanks, this already looks really promising.
Yes, I think this is what Bernd and I also came up with.
We thought about which separator (e.g :
) to use, and I think indeed :
is the most sensible for now.
If we now require unique Id's for every Node in the Graph, this would almost guarantee us to not get any naming clashes.
I think we can again look at how this is done in the mlr
wrappers / multiplexer.
pipeline_gather_params()
:
If we have a Graph / Pipeline class, this is what should automatically be done when we initialize it I think.
Adding / Dropping an OP should then also make sure that the ParamSet is refreshed.
I think I will get around to work on this on Monday!
I think that unique ids are the must have, and it should be check when the PipeLearner
is created. It will be useful for overloading the [[
operator.
we have GraphLearner now
I think that after creating, the graph should be stored inside a
PipeLearner
class.As we discuss, it will be required that the last node will be a Learner so it's parameters like
task_type
andpredict_types
will be copied toPipeLearner
. Other parameters likepackages
will be gathered during the initialization of the object.The object will created by passing the first node of the graph, or by passing the list of PipeOp:
PipelineLearner$new(list(op1, op2, op3))
.The
train
method will call thetrainGraph
function.I'm still thinking how to manage the parameters for each node.