Open luisneumann opened 5 years ago
@luisneumann Interesting, thanks for raising an issue. Could you provide a minimal example that would reproduce the error? Please also show output of pip freeze
, prepare the example in a separate environment and share the version of python. I would guess that here is some misunderstanding between the meaning of those two plots, but maybe I am wrong.
Hi, first of all, thanks for the great package. I am unsure if this is an issue or if i am misunderstanding the colorbar on the summary plot.
I have a classifier using Catboost, with a SHAP explainer as shown below:
model=cb.CatBoostClassifier (iterations=100, depth=8, learning_rate=0.1, loss_function='Logloss')
model.fit(X_train, y_train,cat_features=categorical_features_indices,eval_set=(X_test, y_test),plot=False)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(cb.Pool(X_train,y_train,cat_features=categorical_features_indices))
I create a dependence plot for a single variable using filtered dataframes so there was only one variable to scale the colorbar by:
var_to_plot='TestVar' # (an int64 variable)
idx=X_train.columns.get_loc(var_to_plot)
shap_plot=shap_values[:,idx]
Xtrain_plot=X_train[var_to_plot]
shap.dependence_plot(var_to_plot, shap_plot[:, None], X_train[var_to_plot].to_frame())
The dependence plot using the line below is identical to the one obtained using the line above
shap.dependence_plot(var_to_plot, shap_values, X_train,interaction_index="TestVar")
However, when I compare the dependence plot to the summary plot, the summary plot suggests the largest SHAP values of around 0.8 occur for the highest values of TestVar, but the dependence plot shows that SHAP values of around 0.8 occur for the lower values of TestVar. The color scale on both plots seems inconsistent.
For reference, I am on SHAP 0.28.5 and matplotlib 2.1.1