Closed notiv closed 5 years ago
Hmm, it sounds like you would need to change the xgboost integration to make that happen, i.e. have a custom learner.
Hi Lars, thanks for the quick response.
I did that, but encountered two issues (you can see the details in the SO post):
a) The parameters are not being passed to the predictLearner.classif.xgboost.c
function of the custom learner
b) The checkPredictLearnerOutput
does not allow the predictLearner.classif.xgboost.c
function to return anything else than the standard output structure [0,1,p]. In this case I would need to return a matrix or data.frame.
Any hints how I could work around these issues (the second is of course more important). Are there any learners that provide similar functionality, so that I can take a look how I could modify my code?
a) You need to set predcontrib
when you're creating the learner.
b) What exactly does predcontrib
do? It sounds like getFeatureImportanceLearner()
would be a more suitable place for it.
a) Ok, I get that. I thought it suffices that we have the ellipses in the call (...)
b) Let me elaborate: The usual call, also in the default xgboost learner, is as follows:
p = predict(m, newdata = data.matrix(.newdata), ...)
If we want to calculate the contributions of each feature to the score of each single observation, we call the function as follows:
contrib = predict(m, newdata = data.matrix(.newdata), predcontrib = TRUE, ...)
contrib
in this case is a Large matrix that contains the individual contributions of each feature for each scored observation, i.e. if we score 100 observations and we have 20 features, the matrix will have a dimension of 100x20.
mlr doesn't currently have an interface for b), so this would be a major change that would touch much more than just the learner.
I was afraid so, but didn't want to give up hope... :-)
Is there a way to perform all "wrapper" tasks without executing the predict
part and then run a "custom" predict using predcontrib = TRUE
? What do you think, would mlrCPO be mature enough to replace my wrapper within a wrapper preprocessing? Or maybe any other workaround / idea?
I don't think that there are any easy workarounds here -- although you could extract the model after mlr is done and work with that directly.
Ok, thanks Lars!
getting the "predcontrib" param down to the xgboost predict function is simple and mlr allows this. problem is returning the finegrained information, that xgboost then returns. that does not work.
we have written iml for model agnostic intepretations, did you look into that? https://github.com/christophM/iml
I'll check this out, thanks for the hint Bernd! (xgboost implements Shapley
)
Just in case someone tries to solve the same problem:
As a hint, once we have the wrapped model, we can do the following:
xgb_unwrapped <- mlr::getLearnerModel(wrapped_model, more.unwrap = TRUE)
data_after_preproc <- raw_data %>>% retrafo(wrapped_model)
predictions_w_contributions <- predict(xgb_unwrapped, data_after_preproc, predcontrib = TRUE)
I gave it a try on Stackoverflow and the suggestion was to better try here:
The latest version of xgboost (0.7) allows for the interpretation of predictions by setting the
predcontrib
parameter of thepredict
function to TRUE. This works well if one directly uses the xgboost package, but doesn't work when using xgboost within mlr (with wrappers, cv etc.)Long story short: Is there a way (a work-around would suffice) to pass this parameter to the predict function of the learner and then return both the predictions and the contributions of each feature to each single score?
Or otherwise: Is there a way to unwrap the final, tuned model and "at the right moment" use the predict function of xgboost with the
predcontrib
parameter? This doesn't need to be in thepredictLearner
of a possibly modified xgboost learner, but could be done in a completely separate function.P.S. I can imagine that this use case will become more prevalent in the future with the development of packages like lime.