mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 404 forks source link

S3 internal method: getBaggedPredictions / generalize se estimation for bagged learners #1277

Closed jakob-r closed 5 years ago

jakob-r commented 8 years ago

For all learners that use internal bagging like ranger, randomForest, extraTrees, xgboost(?) we should have at least an internal S3 function like that: (really simple for randomForest)

getBaggedPredictions.regr.randomForest.model = function(.model, .newdata) {
  m = getLearnerModel(.model)
  predict(m, newdata = .newdata, predict.all = TRUE)
}

It returns a matrix with nrows = number of observations and ncol = number of bagged predictors. In predictLearner it can than be used if .learner$predict.type == "se" to call a se method like jackknife or simple standard error (more complex things like bootstrapStandardError need further thought):

getSePrediction(.learner, .model, .newdata) will be a function that simply switches depending on .learner$par.vals$se.method to call a generic function like getSeJackknifePrediction(.learner, .model, .newdata) which can simply rely on getBaggedPredctions(.model, .newdata) to calculate it's jackknife se estimation.

Side remark: Theoretically you could solve it without the S3 getBaggedPredictions and just generate the desired matrix within the code of predictLearner.regr.randomForest but I think it is handy to have this kind of separation if you want to do some fancy stuff later.

(I am quite certain that there have to be related issues)

pat-s commented 5 years ago

I do not see us implementing this in mlr at this stage. Good case for mlr3.