shap / shap

A game theoretic approach to explain the output of any machine learning model.
https://shap.readthedocs.io
MIT License
22.53k stars 3.25k forks source link

Colorbar feature scale mismatch between dependence plot and summary plot #528

Open luisneumann opened 5 years ago

luisneumann commented 5 years ago

Hi, first of all, thanks for the great package. I am unsure if this is an issue or if i am misunderstanding the colorbar on the summary plot.

I have a classifier using Catboost, with a SHAP explainer as shown below:

model=cb.CatBoostClassifier (iterations=100, depth=8, learning_rate=0.1, loss_function='Logloss')
model.fit(X_train, y_train,cat_features=categorical_features_indices,eval_set=(X_test, y_test),plot=False)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(cb.Pool(X_train,y_train,cat_features=categorical_features_indices))

I create a dependence plot for a single variable using filtered dataframes so there was only one variable to scale the colorbar by:

var_to_plot='TestVar' # (an int64 variable)
idx=X_train.columns.get_loc(var_to_plot)
shap_plot=shap_values[:,idx]
Xtrain_plot=X_train[var_to_plot]
shap.dependence_plot(var_to_plot, shap_plot[:, None], X_train[var_to_plot].to_frame())

The dependence plot using the line below is identical to the one obtained using the line above

shap.dependence_plot(var_to_plot, shap_values, X_train,interaction_index="TestVar")

However, when I compare the dependence plot to the summary plot, the summary plot suggests the largest SHAP values of around 0.8 occur for the highest values of TestVar, but the dependence plot shows that SHAP values of around 0.8 occur for the lower values of TestVar. The color scale on both plots seems inconsistent.

image002

For reference, I am on SHAP 0.28.5 and matplotlib 2.1.1

detrin commented 1 year ago

@luisneumann Interesting, thanks for raising an issue. Could you provide a minimal example that would reproduce the error? Please also show output of pip freeze, prepare the example in a separate environment and share the version of python. I would guess that here is some misunderstanding between the meaning of those two plots, but maybe I am wrong.