TLDR: This PR fixes #250, adds tests with xgb models and finds a bug/inconsitency in shap and xgboost.sklearn.XGBClassifier that is not present in shapiq.
Bugfix of #250.
The bug that the baseline prediction was not properly set stems from the fact that xgboost models (note models and not the individual boosters) contain an model.base_score and/or model.intercept_ attributes that store the empty prediction of the xgb models (as log-odds). Now this base_score/intercept is added to the values of the xgb model
Uncovers a bug in shap (not in shapiq)
The test_tree_explainer.test_xgboost_shap_error. contains a test uncovering some inconsistencies with shap: The test is used to show that the shapiq implementation is correct and the shap implementation is doing something weird. For some instances (e.g. the one used in this test) the SHAP values are different from the shapiq values. However, when we round the thresholds of the xgboost trees in shapiq, then the computed explanations match. This is a strange behavior as rounding the thresholds makes the model less true to the original model but only then the explanations match.
TLDR: This PR fixes #250, adds tests with xgb models and finds a bug/inconsitency in
shap
andxgboost.sklearn.XGBClassifier
that is not present inshapiq
.Bugfix of #250.
The bug that the baseline prediction was not properly set stems from the fact that
xgboost
models (note models and not the individual boosters) contain anmodel.base_score
and/ormodel.intercept_
attributes that store the empty prediction of the xgb models (as log-odds). Now this base_score/intercept is added to the values of the xgb modelUncovers a bug in
shap
(not inshapiq
)The
test_tree_explainer.test_xgboost_shap_error
. contains a test uncovering some inconsistencies withshap
: The test is used to show that theshapiq
implementation is correct and theshap
implementation is doing something weird. For some instances (e.g. the one used in this test) the SHAP values are different from the shapiq values. However, when we round thethresholds
of the xgboost trees in shapiq, then the computed explanations match. This is a strange behavior as rounding the thresholds makes the model less true to the original model but only then the explanations match.