Open trendelkampschroer opened 11 months ago
@jameslamb thanks a lot for updating the issue title and triaging the issue. I don't think this is merely a usage question, but a bug. Compare e.g. https://github.com/shap/shap/blob/4fa04f89e00b54ac649a86b755873c953c208e3f/shap/explainers/_tree.py#L405
in the SHAP package where pred_contrib=True
is used to compute SHAP values and for a random forest model the computed values will be wrong, in the sense that the sum of expectation and SHAP values will not be equal to the prediction.
The documentation at https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.predict
does also suggest that I can get the actual SHAP values for a random forest model using pred_contrib=True
.
A possibly related issue is also documented here, https://github.com/shap/shap/issues/669.
Description
For a random forest model contributions are not averaged across individual trees.
Below you can see that the contributions (plus expectation) sum to the raw prediction (sum of predictions from trees in the random forest) but not to the average of predictions from trees in the random forest.
Reproducible example
Environment info
LightGBM version or commit hash:
Command(s) you used to install LightGBM