google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
498 stars 53 forks source link

(General inquiry) library differentiators #90

Open LukeWood opened 7 months ago

LukeWood commented 7 months ago

Hello! Great work on yggdrasil - love the reference in the name.

I’ve been playing around with the library a bit and I was curious what the biggest feature differentiators of the library are from Xgboost/lgbm/others!

Would you say it’s the ability to leverage the tensor flow ecosystem? Anything in particular that you’ve found you “just get for free” by leveraging TF (maybe hardware acceleration is easy?).

Just generally curious, and hoping to learn more! Everything looks very cool and I love the idea to make decision forests in TF more streamlined.

cheers!

rlcauvin commented 7 months ago

One powerful use of the library is combining decision forest models with other Keras models. This documentation describes how you can "stack" a decision forest model on top of a pre-trained neural network model or combine several models (including a decision forest model) into an "ensemble" that averages their predictions.

achoum commented 6 months ago

Hi Luke,

Tl;dr:

As @rlcauvin mentioned, one of the values of YDF is its tight integration with the TensorFlow (TF) ecosystem. YDF models can run in TF Serving or be imported in TF JS, making it easy to use if you already have a TensorFlow pipeline.

Moreover, YDF is composable and aims to work well with other ML tools. For example, YDF models can be combined using TensorFlow, Keras 2, and Keras 3 using the TF backend. Work is also underway to integrate with JAX and some other surfaces.

This modularity notably allows for the creation of hybrid neural-network + decision forests models that can sometimes perform better than non-hybrid ones. For instance, the excellent sample efficiency of decision forests makes them suitable for merging signals from multiple models in complex pipelines. Fine-tuning decision forests alongside neural networks is another advanced technique being explored.

Regarding its unique features, YDF includes exact distributed training, oblique splits, example distance, and support for uplift modeling.

Finally, YDF simplifies development and productionization. For instance, model evaluation and understanding, two critical steps in decision forest productionization, are particularly easy with YDF. For example, calling "model.evaluate(dataset)" in Colab creates an interactive view with all the relevant metrics. Other methods like "model.analyze," "model.benchmark", "model.to_cpp", and "model.describe" further simplify the developer's life. Other features that aims to simplify the work of users and reduces the likelihood of mistakes have been described in the YDF paper.