grf-labs / grf

Generalized Random Forests
https://grf-labs.github.io/grf/
GNU General Public License v3.0
970 stars 249 forks source link

Implementation of (fast) quantile regression #1076

Open MaxTailt opened 2 years ago

MaxTailt commented 2 years ago

Hi all,

I have seen in the R packages ranger and quantregForest that a fast version of quantile computation is currently used, see : https://github.com/imbs-hl/ranger/issues/207 https://github.com/lorismichel/quantregForest/issues/3

I wonder whether the Meinshausen's quantile regression forest algorithm (and generalized random forests) use this fast implementation in the "grf" package. I know that most of the grf contributors are also ranger contributors but I want to be sure ; I am not familiar with C/C++ routines.

Indeed, I am working on the adaptation of forests methods for extreme quantile regression and I would like to run a proper version of Meinshausen's algorithm, in order to compare the sensitivity of the fast implementation for extreme rainfall prediction.

Thanks in advance,

Max

jtibshirani commented 2 years ago

Hello @MaxTailt, grf's quantile_forest method does not actually implement Meinshausen's quantile regression forest algorithm. A major difference is that grf makes splits that are sensitive to quantiles, whereas Meinshausen's method uses standard CART splits. The grf paper gives more details on the difference in section 5: https://arxiv.org/pdf/1610.01271.pdf.

I haven't taken a close look at the performance optimization in lorismichel/quantregForest#3 to see whether it could apply to grf. Currently we don't have any similar optimization.

MaxTailt commented 2 years ago

Hello @jtibshirani Thanks for you response. I understood the differences between grf splits and standard CART splits. But I read in the doc that grf's quantile_forest method is Meinshausen's algorithm by setting the option regression.splitting=TRUE

And so what happen to performance optimization with regression.splitting=TRUE ?

Thanks,

Max