Closed MathieuMarauri closed 4 years ago
Is this resolved? How did you solve the problem in the end?
I can integrate the measure using the following code.
library("mlr")
library("stringi")
# function to measure the MVU
mvu_func = function(task, model, pred, feats, extra.args) {
if (!inherits(model$learner.model, "party")) tree <- partykit::as.party(model$learner.model)
rls = partykit:::.list.rules.party(tree)
rule = rls[as.character(predict(tree, type = "node"))]
vu = stringi::stri_count_regex(rule, paste0("(", paste(names(getTaskData(task)), collapse = "|"), ")"))
return(mean(vu))
}
# generate the measure object
mvu = makeMeasure(
id = "mcu", name = "Mean Variables Used",
properties = c("classif", "classif.multi", "regr", "multilabel", "surv", "cluster", "req.model", "req.task"),
minimize = TRUE, best = 1, worst = Inf,
fun = mvu_func,
note = "Only available for decision trees (object that can be converted to party object)"
)
model = train("classif.rpart", iris.task)
pred = predict(model, iris.task)
performance(pred, model = model, task = iris.task, measure = mvu)
This code does not work with the implementation of ctree
in mlr
. The mlr
implementation is from the party
package and the code partykit::as.party(model$learner.model)
does not work for this package. It works for the partykit::ctree
function.
Also note that this function can also theoretically work for every types of model but the way to compute the measure would be completely different (e.g. for regression it is the number of predictors used in the model).
My problem is not so much with mlr
but with finding a mvu_func
that would work for every trees possible and ideally for every models. As it is not related to mlr
I closed the issue.
Anyway if you know a way to compute such a performance measure (average number of predictors used to make predictions) I would love to be pointed to the right direction.
Cheers, Mathieu
Thanks -- I'm not aware of anything that does this in general. As a crude approximation, you could save the model and check the size of the saved file though.
Hello,
First thank you for the nice work, I am enjoying working with mlr and I look forward to use mlr3.
I have a really specific use case where I want to have a performance measure that is the mean number of variables used for each prediction. In the context of decision tree this can be used to measure the simplicity of the tree (see this paper section 2.4.2 on Fast and Frugal decision trees).
I can produce such a performance measure on trees generated by
mlr
retrieving the learner.model object but I cannot integrate the measure inmlr
using the tutorial as I need the prediction type to be "node" as shown here.Did I miss something and
predict.type = "node"
is possible or is there a way to make it possible?Is what I am trying to achieve too specific and should not be integrated into mlr?
Thank you, Mathieu