mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 405 forks source link

Feature Request: Prediction Bounds #1272

Closed SteveBronder closed 8 years ago

SteveBronder commented 8 years ago

caret has a parameter in trainControl() for specifying the prediction bounds

predictionBounds
a logical or numeric vector of length 2 (regression only). If logical, the predictions can be constrained to be within the limit of the training set outcomes. For example, a value of c(TRUE, FALSE) would only constrain the lower end of predictions. If numeric, specific bounds can be used. For example, if c(10, NA), values below 10 would be predicted as 10 (with no constraint in the upper side).

This can be very nice when you are working with financial data as you know values cannot be negative. Would it be possible to implement this in mlr?

My first thought was to have it be in makeRegrTask() as we are specifying for our prediction task there are given bounds. Then the process would follow:

  1. When a new model is trained, attach prediction.bounds to the new trained model so predict() can be made aware of the bounds.
  2. From a trained model,makePrediction.TaskDescRegr() will get the prediction bounds and do 'clean up' there.

Any thoughts on this? Maybe it could be passed like a preprocessing function?

larskotthoff commented 8 years ago

Related to #1121.

This is trivial to do yourself though; all you need is an ifelse on the predictions.

zmjones commented 8 years ago

Also there are plenty of learners that won't extrapolate outside the empirical support of the target.

SteveBronder commented 8 years ago

@larskotthoff I think that's a much more complicated version of what I'm requesting. During prediction if a prediction value is -1.5, but we know the target variable's values cannot be less than 0, we would just do a check and correct for those values out of our bounds.

@zmjones this is true, but there are others that can! When I was using caret irl I found that setting prediction bounds explicitly gave me a much more reasonable model. Plus, if anyone ever uses this in business it's something that the user would probably have to set themselves anyway.

I think adding predict.bounds in the regression task and makePrediction() could be useful

Am I able to have multiple forks of mlr on github? If so I can add this and make a pull request.

zmjones commented 8 years ago

Sure I can see the utility. No I don't think you can have multiple forks. I would just create another branch locally, pull from the master branch here, and then push to a new branch on your forked repo and then issue a pr.

larskotthoff commented 8 years ago

You can have a fork per account, so you could just register another account.

SteveBronder commented 8 years ago

@larskotthoff @zmjones

I'm going to close this for now, once my current fork is implimented I will add this

berndbischl commented 8 years ago

just adding a little note here as the discussion seems to be over:

i do see @Stevo15025's point. and its not a simple ifelse, as you need to basically either have this as on option in the learner or a wrapper.

berndbischl commented 8 years ago

PS: altough i would guess that doing something similar like isotonic regression is then still a better approach then hard capping

SteveBronder commented 8 years ago

@berndbischl We could also do it in a pre-processing function if we allowed pre-processing schemes to do things to the predictions :-P

Also, #random allowing pre-processing schemes on predictions would let me add the Lambert W transforms

Just as an example it could look something like

makePreprocWrapperPostPred = function(learner, par1 = foo, par2 = bar, par3 = doo) {
  trainfun = function(data, target, args = list(par1, par2)) {
    blah blah blah
      }
  predictfun = function(data, target, args, control) {
    blah blah
      }
  post.pred.fun = function(prediction, args, par3, control){
    do stuff post prediction here
  }
  makePreprocWrapper(
    learner,
    train = trainfun,
    predict = predictfun,
    post.predict = post.predict.fun,
    par.set = makeParamSet(
       blah blah blah
    ),
    par.vals = list(par1 = par1, par2 = par2, par3 = par3)
  )
}
berndbischl commented 8 years ago

shouldnt we separate this? but simply let the user add a "post processor" for the predictions?