imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
774 stars 194 forks source link

Quantile Regression #207

Closed cargomoose closed 6 years ago

cargomoose commented 7 years ago

Could you please add quantile regression capability - something already offered by extratrees and RandomUnifromForest And existing forest model can be used to do quantile regression, at any quantile, with the quantiles supplied to the "predict" function along with a specific "quantile" type.

PhilippPro commented 7 years ago

This is already implemented in my quantregRanger package: https://github.com/PhilippPro/quantregRanger

It is not on CRAN yet, need to write some tests first.

cargomoose commented 7 years ago

Thanks Phillipp :

I notice from looking at the documentation that in quantregRanger you need to set the quantiles in the call that builds the model, not the predict function. You are thus building a separate model for each quantile, perhaps by using the quantile regression error function as a target.

I was hoping for an implementation that works like quantregforest, where a single regression forest is built the usual way, and then any quantiles can be derived from the bootstrapped ensemble, specified in the predict function.

This has a number of advantages, including a much faster solution when many quantiles are required.

Is this something that you might also add ?

best regards,

Dr. Eugene Dubossarsky Principal Trainer Presciient http://presciient.com/ 0414573322

On 23 Sep 2017, at 12:09 am, Philipp Probst notifications@github.com wrote:

This is already implemented in my quantregRanger package: https://github.com/PhilippPro/quantregRanger https://github.com/PhilippPro/quantregRanger Is not on cran yet, need to write some tests first.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/imbs-hl/ranger/issues/207#issuecomment-331457602, or mute the thread https://github.com/notifications/unsubscribe-auth/AEYsmT78_9QwxFBloxiyWJVOsJXj5NBcks5sk7-fgaJpZM4N4P3E.

PhilippPro commented 7 years ago

Dear Eugene,

it was not well documented, I pushed a new version to https://github.com/PhilippPro/quantregRanger

In the modell building process only one model is build. The different quantiles can be specified when predicting. The quantiles in the modell building process were used for some kind of variable importance, but this is currently not working, so nothing has to be set here.

So the current workflow is similar to quantregForest:

library(quantregRanger)
y = rnorm(150)
x = cbind(y + rnorm(150), rnorm(150))
data = data.frame(x,y)
mod = quantregRanger(y ~ ., data = data, params.ranger = list(mtry = 2))
predict(mod, data = data[1:5, ], quantiles = c(0.1, 0.5, 0.9))

Best regards, Philipp

cargomoose commented 7 years ago

Hi Philipp,

This is terrific news ! I will certainly be trying it out. Is this something that the regular ranger package might incorporate at some point ?

best regards,

Dr. Eugene Dubossarsky Principal Trainer Presciient http://presciient.com/ 0414573322

On 25 Sep 2017, at 7:06 pm, Philipp Probst notifications@github.com wrote:

Dear Eugene,

it was not well documented, I pushed a new version to https://github.com/PhilippPro/quantregRanger https://github.com/PhilippPro/quantregRanger In the modell building process only one model is build. The different quantiles can be specified when predicting. The quantiles in the modell building process were used for some kind of variable importance, but this is currently not working, so nothing has to be set here.

So the current workflow is similar to quantregForest:

library(quantregRanger) y = rnorm(150) x = cbind(y + rnorm(150), rnorm(150)) data = data.frame(x,y) mod = quantregRanger(y ~ ., data = data, params.ranger = list(mtry = 2)) predict(mod, data = data[1:5, ], quantiles = c(0.1, 0.5, 0.9)) Best regards, Philipp

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/imbs-hl/ranger/issues/207#issuecomment-331821476, or mute the thread https://github.com/notifications/unsubscribe-auth/AEYsmS2AwnrPegH-Vg3SmmAdh_5mpYFLks5sl20mgaJpZM4N4P3E.

PhilippPro commented 7 years ago

This is a question that only @mnwright can answer. ;)

cargomoose commented 7 years ago

Then I hope you’ve already asked him ! you have my support :)

On 26 Sep 2017, at 5:14 pm, Philipp Probst notifications@github.com wrote:

This is a question that only @mnwright https://github.com/mnwright can answer. ;)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/imbs-hl/ranger/issues/207#issuecomment-332107915, or mute the thread https://github.com/notifications/unsubscribe-auth/AEYsmRcb7LnHxgO4yoivwLir3bPk8QRZks5smKRMgaJpZM4N4P3E.

mnwright commented 7 years ago

Yes, if @PhilippPro is fine with that, I'd like to have it! I'll have a look at the package to see what has to be changed.

PhilippPro commented 7 years ago

Yea, no problem. If you want, you can put me as contributor. Or make some advertisement for the tuneRF package, if you like it.

There are some "not used" parts I took over from quantregForest (a big part of the code is from there), for example the variable importance calculation.

By the way, if you have some time and want you still can join our paper about the hyperparameters (first part) and the tuning (second part). I can send you a link to the draft, if you want.

I am currently also thinking about functions to create performance plots, dependent on hyperparameters, similar to the OOBCurve package. but for other parameters, with the help of the out-of-bag observations. E.g. people can see if the mtry curve is increasing or decreasing. Nothing big, but maybe a nice feature for users to find their optimal parameters. Maybe put it in a package called "RFtools" or something similar.

mnwright commented 7 years ago

Yea, no problem. If you want, you can put me as contributor. Or make some advertisement for the tuneRF package, if you like it.

Okay, will do. Thanks!

By the way, if you have some time and want you still can join our paper about the hyperparameters (first part) and the tuning (second part). I can send you a link to the draft, if you want.

Sure, I'm happy to join if I can help.

I am currently also thinking about functions to create performance plots, dependent on hyperparameters, similar to the OOBCurve package. but for other parameters, with the help of the out-of-bag observations. E.g. people can see if the mtry curve is increasing or decreasing. Nothing big, but maybe a nice feature for users to find their optimal parameters. Maybe put it in a package called "RFtools" or something similar.

I see you have already added a function to OOBCurve. Why not keep it there?

PhilippPro commented 7 years ago
Yea, no problem. If you want, you can put me as contributor. Or make some advertisement for the tuneRF package, if you like it.

Okay, will do. Thanks!

By the way, if you have some time and want you still can join our paper about the hyperparameters (first part) and the tuning (second part). I can send you a link to the draft, if you want.

Sure, I'm happy to join if I can help.

I write you a pm.

I am currently also thinking about functions to create performance plots, dependent on hyperparameters, similar to the OOBCurve package. but for other parameters, with the help of the out-of-bag observations. E.g. people can see if the mtry curve is increasing or decreasing. Nothing big, but maybe a nice feature for users to find their optimal parameters. Maybe put it in a package called "RFtools" or something similar.

I see you have already added a function to OOBCurve. Why not keep it there?

Yes that is possibly a good place. (y)

mnwright commented 6 years ago

I've included quantile regression in ranger, see imbs-hl/ranger#247. Try this example:

#devtools::install_github("imbs-hl/ranger", ref = "myquantreg”)
library(ranger)
rf <- ranger(mpg ~ ., mtcars[1:26, ], quantreg = TRUE)
pred <- predict(rf, mtcars[27:32, ], type = "quantiles", quantiles = c(0.2, 0.8))
pred$predictions

This version uses a new efficient implementation from quantregForest, see lorismichel/quantregForest#3 for a description.

thengl commented 6 years ago

That looks very useful Marvin. Is there also a way to specify the quantile probs e.g. probs=c(0.159, 0.500, 0.841) instead of c(0.1, 0.5, 0.9)?

mnwright commented 6 years ago

Yes, I've edited my comment above.

kysolvik commented 6 years ago

Hi,

Thanks for adding this Marvin!

I added 'quantiles' and 'se' to the prediction type value error message in predict.R. https://github.com/imbs-hl/ranger/pull/262

cargomoose commented 4 years ago

Hi Phillipp and Marvin,,

I am making quite a bit more use of the quantile regression in ranger, with thanks again to you for implementing it and making it part of ranger.

It has however become apparent that the computational overhead is huge, and was wondering whether this could be improved. For a dataset of about 40,000 records and 50 fields, the normal ranger run takes a few seconds. If however the quantile regression option is selected, the ranger run takes 4 hours. With 67,000 cases it takes at least 12 hours, probably more - I have always shut it down before it completed. This is the initial ranger call, with the quantile regression option selected. Without it, but with keep.inbag and write.forest selected the command takes only a few seconds to run.

Is there any way to speed this up ?

Also, I was wondering whether it might be possible to have out-of-bag quantile regression estimates on the training set.

best regards,

Eugene.

On 23 Sep 2017, at 12:09 am, Philipp Probst notifications@github.com wrote:

This is already implemented in my quantregRanger package: https://github.com/PhilippPro/quantregRanger https://github.com/PhilippPro/quantregRanger Is not on cran yet, need to write some tests first.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/imbs-hl/ranger/issues/207#issuecomment-331457602, or mute the thread https://github.com/notifications/unsubscribe-auth/AEYsmT78_9QwxFBloxiyWJVOsJXj5NBcks5sk7-fgaJpZM4N4P3E.

mnwright commented 4 years ago

Try the current version from Github. We had major improvements when quantreg = TRUE, keep.inbag = TRUE since the last CRAN release, see #475. If that doesn't help, please post a reproducible example.

Also, I was wondering whether it might be possible to have out-of-bag quantile regression estimates on the training set.

That's possible. Just run with quantreg = TRUE, keep.inbag = TRUE and predict() without data argument.