microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.59k stars 3.83k forks source link

How exactly LightGBM predictions are obtained? #3571

Closed maksymiuks closed 3 years ago

maksymiuks commented 3 years ago

Hi

In the beginning, I'd like to appreciate your work covering that package. I consider it a great tool. That's why I'm working on a dedicated R interface for tree ensemble models that allows calculating shap values fast using C++ code via Rcpp. LightGBM package is one in the scope of my interest. But for that, I need the information on how exactly the committee of trees is aggregated. From my inspection of the code, I have a hunch that the final prediction is a sum of prediction for all trees with some intercept subtracted. Am I correct? If so how to find that intercept because I wasn't able to in the model object. The goal for me is to acquire a plain sum of predictions for all trees.

Best Regards Szymon Maksymiuk

guolinke commented 3 years ago

except for mutli-class tasks, the prediction is the sum of prediction of each tree. For some tasks, like binary classification, there could be a transformation after sum, like sigmoid. For multi-class, you need to sum the prediction by class first (trees are organized as tree[i * K + j], where i is iteration, j is class-id, and K is the number of class), and use softmax to get the probabilities for classes.

btrotta commented 3 years ago

I'm not sure if I'm understanding your question correctly, but I think when you talk about the "intercept" you mean something like the baseline constant prediction? E.g. for a binary prediction problem where the training labels are 90% ones and 10% zeros, we would start with a constant prediction of 0.9 and then add trees to improve the accuracy. This is indeed how LightGBM works, and this constant value is added to the leaf values of the first tree. So if you use Booster.save_model() (https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html?highlight=save%20model#lightgbm.Booster.save_model) the leaf values for the first tree include this baseline value. The relevant part of the C++ code is in TrainOneIter https://github.com/microsoft/LightGBM/blob/f38f118ce2fc6451b75363689fc06e011d69cf33/src/boosting/gbdt.cpp#L350 In the first iteration (when gradients and hessians are nullptr), it calls BoostFromAverage which calculates the constant initial prediction. Then it calculates the optimal tree (fitting to the error from the constant prediction), and later it calls AddBias to add the constant to the individual leaf values.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.