imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
770 stars 193 forks source link

Trees as first class citizens #490

Open talegari opened 4 years ago

talegari commented 4 years ago

Hi Marvin,

I am trying to do these things:

  1. Combine selected trees from multiple forests. Simplest case is combining two forests. (Partially addressed here)

  2. Prune a tree (or all or some trees in a forest) with cost complexity pruning or simply to a predefined depth.

  3. Grow some existing trees to greater depth on newdata. This is an exercise as a part of continued learning on newer datasets. (handle drift in data as model phases out). ref: https://github.com/mlr-org/mlr3learners/issues/7

IMHO, this requires substantial new API on existing ranger class. Does this fit your vision of what ranger is supposed to do?

Regards, Srikanth

mnwright commented 4 years ago

IMHO, this requires substantial new API on existing ranger class.

I agree. We would need something like a Tree class accessible from R. It might not only be a new API but a major re-design.

Does this fit your vision of what ranger is supposed to do?

I'm not sure. Currently, the trees are saved by three simple vectors (child nodes, split variables, split values). I like this simplicity. We might also run into problems keeping both R and C++ versions without too much maintenance overhead. On the other hand, as long as computation speed and memory is not affected, I'm not generally against a re-design.

brshallo commented 2 years ago

I'd be interested in item 2 as potentially useful in reviewing ways to reduce the size of ranger model objects, as discussed here: https://stackoverflow.com/questions/72961569/review-performance-of-smaller-model-subsets-of-a-large-random-forest-model?noredirect=1#comment128880412_72961569