imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
776 stars 193 forks source link

Create a list of forks, extensions etc. #724

Open mnwright opened 6 months ago

mnwright commented 6 months ago

Include things like:

Could also link to packages that heavily rely on ranger, such as:

Anything missing?

stephematician commented 2 months ago

Well, I'd never want to take away from the mighty ranger! But I did eventually release literanger which:

  1. Is about 15-20% faster in training, and 100% faster in prediction.
  2. In particular, it addresses #304 .
  3. It also offers efficient and compact serialization of trained models via cereal.

It only supports some of ranger's features. I originally intended to pull it into ranger itself - but I never got that far. I've made sure to include the correct license and attribution as best as I can (let me know if I haven't).

I'm not sure how much future there is literanger, but at least it is also available as a backend for multiple imputation with random forests in MICE (see https://github.com/amices/mice/pull/648).

As always, props to you @mnwright for this package, it was a real inspiration and I learnt a lot in refactoring.

stephematician commented 2 months ago

I suppose along with #304 - it also goes some way towards the 'C++ API' that some other issues have raised (e.g. #644 etc)

mnwright commented 2 months ago

Nice! Particularly the C++ API is something I have on my list for a long time but never made it and probably won't do it soon. For the speedup, is there anything specific we might merge back into ranger?

In the long run, I think it would be best to converge back to a single package. I often thought about starting ranger v2 from scratch 😆, maybe that would be a good starting point (but probably won't happen soon).

Btw.: I think for large and high dimensional data, there is no speedup (at least not for training, haven't tried prediction yet).

stephematician commented 1 month ago

@mnwright - if you're using the CRAN version, then training will be about the same (if not slightly slower) than ranger. I only recently fixed up some performance issues with the help of a profiler. It's hard to say if these could be translated into ranger - some of the speedup seems to be due to better inlining.

I'm also currently in the throes of a resubmission to CRAN as there are some issues with the serialization package (Rcereal), and the maintainer seems awol :(