google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
498 stars 53 forks source link

Running quick Scorer Extended Model #57

Closed kanchanchy closed 1 year ago

kanchanchy commented 1 year ago

Could you please let me know how to run the quick scorer extended model? There is a test file quick_scorer_extended_test.cc, but it creates a toy model on a toy dataset. I want something similar to the examples available in examples/beginner_cc, but that example does not show how to run the quick scorer algorithm.

I need to train a Classification model with GradientBoostedTrees on a CSV dataset and convert the trained model to GradientBoostedTreesBinaryClassificationQuickScorerExtended model to perform fast inference. How to update examples/beginner_cc? Can anyone guide me on this?

rstz commented 1 year ago

Hi, seems like you found the solution, but quickly for reference: The code at examples/beginner_cc essentially does what you want to. You'll have to adapt the training configuration (set the learner to GRADIENT_BOOSTED_TREES, adapt the label to what you need for your dataset ...). YDF automatically chooses the fastest engine for your task when running

  // Compile the model into an engine for fast inference.
  const auto engine = model->BuildFastEngine().value();

If your CPU supports AVX2 instructions (i.e. it's a non-ancient Intel CPU; arm64 machines are not yet supported), GradientBoostedTreesBinaryClassificationQuickScorerExtended will be chosen.

For benchmarking, you can also have a look at the CLI benchmarking tool at yggdrasil_decision_forests/cli/benchmark_inference.cc . Let us know if you have further questions.

kanchanchy commented 1 year ago

Thanks @rstz. I understood the fact after some debugging. You are correct.