stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.62k stars 214 forks source link

RE: shap values for ngboost model_ouput=1 #291

Open AliSamiiXOM opened 2 years ago

AliSamiiXOM commented 2 years ago

Hi and thanks for the great work ! I am having trouble understanding what the shap values for model_output=1 represent. Here is a sample notebook:

https://github.com/AliSamiiXOM/ngboost_question/blob/main/shap_with_ngboost.ipynb.

What I am expecting to see is that the sum of shap values for all features be equal to target variable minus expected value of target. This is true for mean output (model_output=0). But as shown in the last cell of the linked notebook, the scale output does not satisfy this. This is most probably a question, rather than a bug or issue, but it still can be helpful for future reference to be asked here.

Penna88 commented 9 months ago

Hi AliSamiiXOM,

I faced the same issue because I wanted to interpret the Shap Values for NGBoost using Gamma distribution.

Short Answer:

Shapley values are calcolated for param[0] and param[1] depending on what you select on model_output.

The problem is that params[0] and params[1] mean different things depending on the considered distribution. For example, I opened the folder "ngboost/ngboost/distns" and check the gamma.py. You clearly notice the relation between alpha and beta and params at rows 39 and 40 (and below for convenience):

self.alpha = np.exp(params[0]) self.beta = np.exp(params[1])

After applying np.exp, the meaning of shapley values is clear. Below the code i used for checking.

# Shap Analysis
# Get Predictions

model_out = 0

explainer = shap.TreeExplainer(ngb_model, model_output = model_out)
explanation = explainer(x_test)
explanation.base_values = explanation.base_values.reshape(-1)

# Get Predictions 
predicted = ngb_model.predict(x_test) # mean of the predictive distribution
pred_alpha = ngb_model.pred_dist(x_test).params['alpha']
pred_beta = ngb_model.pred_dist(x_test).params['beta']

# check the properties of Explanation object
assert explanation.values.shape == (*x_test.shape,)
assert explanation.base_values.shape == (len(x_test),)

if model_out == 0:
    assert (
        np.abs(np.exp(explanation.values.sum(1) + explanation.base_values) - pred_alpha).max()
        < 1e-5
    )
else:
    assert (
        np.abs(np.exp(explanation.values.sum(1) + explanation.base_values) - pred_beta).max()
        < 1e-5
 )

Hope it helps.