Closed maksymiuks closed 3 years ago
except for mutli-class tasks, the prediction is the sum of prediction of each tree. For some tasks, like binary classification, there could be a transformation after sum, like sigmoid. For multi-class, you need to sum the prediction by class first (trees are organized as tree[i * K + j], where i is iteration, j is class-id, and K is the number of class), and use softmax to get the probabilities for classes.
I'm not sure if I'm understanding your question correctly, but I think when you talk about the "intercept" you mean something like the baseline constant prediction? E.g. for a binary prediction problem where the training labels are 90% ones and 10% zeros, we would start with a constant prediction of 0.9 and then add trees to improve the accuracy. This is indeed how LightGBM works, and this constant value is added to the leaf values of the first tree. So if you use Booster.save_model()
(https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html?highlight=save%20model#lightgbm.Booster.save_model) the leaf values for the first tree include this baseline value.
The relevant part of the C++ code is in TrainOneIter
https://github.com/microsoft/LightGBM/blob/f38f118ce2fc6451b75363689fc06e011d69cf33/src/boosting/gbdt.cpp#L350 In the first iteration (when gradients
and hessians
are nullptr
), it calls BoostFromAverage
which calculates the constant initial prediction. Then it calculates the optimal tree (fitting to the error from the constant prediction), and later it calls AddBias
to add the constant to the individual leaf values.
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.
Hi
In the beginning, I'd like to appreciate your work covering that package. I consider it a great tool. That's why I'm working on a dedicated R interface for tree ensemble models that allows calculating shap values fast using C++ code via Rcpp. LightGBM package is one in the scope of my interest. But for that, I need the information on how exactly the committee of trees is aggregated. From my inspection of the code, I have a hunch that the final prediction is a sum of prediction for all trees with some intercept subtracted. Am I correct? If so how to find that intercept because I wasn't able to in the model object. The goal for me is to acquire a plain sum of predictions for all trees.
Best Regards Szymon Maksymiuk