Problem Description
Two features can have the same mean_abs_shap value as computed by calculate_shap_importance. But the underlying shap values averaged over can be very different. eg. One can be very coherent and have small variance, the other could have very high variance. When building an ML model - I would argue we prefer features that are more consistently important. My proposal is a small adjustment to the mean_abs_shap calculation that accounts for the variance of the underlying shap values.
Desired Outcome
Update calculation of mean_abs_shap in calculate_shap_importance to account for the std of the underlying shap values.
For example: shap_abs_mean = np.mean(np.abs(shap_values), axis=0) - np.std(np.abs(shap_values), axis=0) / 2.0
The outcome of this adjustment would result in features with high variance in shap values being penalised slightly in the ranking of feature importance in calculate_shap_importance.
Solution Outline
In shap_helpers.py, one line adjustment: shap_abs_mean = np.mean(np.abs(shap_values), axis=0) - np.std(np.abs(shap_values), axis=0) / 2.0
Question: should there be a parameter added to fit that would control turning this on/off.
Happy to submit PR to this proposal. LMK if this would be of interest.
Problem Description Two features can have the same
mean_abs_shap
value as computed bycalculate_shap_importance
. But the underlying shap values averaged over can be very different. eg. One can be very coherent and have small variance, the other could have very high variance. When building an ML model - I would argue we prefer features that are more consistently important. My proposal is a small adjustment to themean_abs_shap
calculation that accounts for the variance of the underlying shap values.Desired Outcome Update calculation of
mean_abs_shap
incalculate_shap_importance
to account for the std of the underlying shap values. For example:shap_abs_mean = np.mean(np.abs(shap_values), axis=0) - np.std(np.abs(shap_values), axis=0) / 2.0
The outcome of this adjustment would result in features with high variance in shap values being penalised slightly in the ranking of feature importance incalculate_shap_importance
.Solution Outline In
shap_helpers.py
, one line adjustment:shap_abs_mean = np.mean(np.abs(shap_values), axis=0) - np.std(np.abs(shap_values), axis=0) / 2.0
Question: should there be a parameter added to fit that would control turning this on/off.
Happy to submit PR to this proposal. LMK if this would be of interest.