lorismichel / quantregForest

R package - Quantile Regression Forests, a tree-based ensemble method for estimation of conditional quantiles (Meinshausen, 2006).
25 stars 6 forks source link

Changes in v1.3 #3

Open mnwright opened 6 years ago

mnwright commented 6 years ago

Hi,

There seems to be a major re-design in version 1.3. If I understand your code correctly, you don't compute the weights (as described in the paper) anymore but just sample one random training observation per node. The quantiles are now computed on the distribution over all trees.

Is this correct? What are the implications of the changes?

Thanks!

lorismichel commented 6 years ago

Dear Marvin,

sorry for the delay of response, here is an answer to your question from the author of the package, Prof. Nicolai Meinshausen:

The change allows faster computation for technical reasons in most settings we checked. Note that the large-tree limit is exactly the same as the weights correspond to the probability that a particular sample will be retained. To match the same accuracy, we might need a few more trees in the new version but (on the other side) we just have to retain a single observation per node and not a whole weight vector. The change helps especially in settings where memory size becomes a constraining factor.

mnwright commented 6 years ago

This is great news! I've already re-implemented the new approach in ranger, see imbs-hl/ranger#247. From the tests I've run so far the results are equally well compared to the old version but it's way faster.

Since the implementation in ranger is based on your code, we might add one of you as a contributor if you like.

See also some discussions in imbs-hl/ranger#207 and imbs-hl/ranger#136.