ing-bank / probatus

Validation (like Recursive Feature Elimination for SHAP) of (multiclass) classifiers & regressors and data used to develop them.
https://ing-bank.github.io/probatus
MIT License
124 stars 39 forks source link

Return SHAP values after ShapRFECV function call #150

Open SamyakHM opened 3 years ago

SamyakHM commented 3 years ago

Problem Description Currently, it is not possible to store/retrieve the SHAP Values for individual features before they are eliminated to give a reduced feature set. This limits the analysis of SHAP values across multiple runs.

Desired Outcome The SHAP values computed after every run should be available as a dataframe for us to analyze/manipulate. This will help get an overview of how SHAP values stack up for different feature group without automatic elimination.

Solution Outline No particular requirement.

Matgrb commented 3 years ago

I think this would be a great addition, and relatively easy to implement.

in line 558 of feature_elimination.py:

self._report_current_results(
                round_number=round_number,
                current_features_set=current_features_set,
                features_to_remove=features_to_remove,
                train_metric_mean=np.round(np.mean(scores_train), 3),                
               train_metric_std=np.round(np.std(scores_train), 3),                
               val_metric_mean=np.round(np.mean(scores_val), 3),                
               val_metric_std=np.round(np.std(scores_val), 3),            )

onee can pass shap_importance_df and store the shap values there as well, possibly as dict or something else.

Another improvement would be to write a small function to retrieve shap values from the results for a given number of features e.g. get_reduced_feature_set_shap_values.

Anyone would like to implement this issue?