SAGE (Shapley Additive Global importancE) is a game-theoretic approach for understanding black-box machine learning models. It quantifies each feature's importance based on how much predictive power it contributes, and it accounts for complex feature interactions using the Shapley value.
SAGE was introduced in this paper, but if you're new to using Shapley values you may want to start by reading this blog post.
The easiest way to get started is to install the sage-importance
package with pip
:
pip install sage-importance
Alternatively, you can clone the repository and install the package in your Python environment as follows:
git clone https://github.com/iancovert/sage.git
cd sage
pip install .
SAGE is model-agnostic, so you can use it with any kind of machine learning model (linear models, GBMs, neural networks, etc). All you need to do is set up an imputer to handle held out features, and then estimate the Shapley values:
import sage
# Get data
x, y = ...
feature_names = ...
# Get model
model = ...
# Set up an imputer to handle missing features
imputer = sage.MarginalImputer(model, x[:128])
# Set up an estimator
estimator = sage.PermutationEstimator(imputer, 'mse')
# Calculate SAGE values
sage_values = estimator(x, y)
sage_values.plot(feature_names)
The result will look like this:
Our implementation supports several features to make estimating the Shapley values easier:
Check out the following notebooks to get started:
If you want to replicate the experiments described in our paper, see this separate repository.
This repository provides some flexibility in how you generate explanations. You can make several choices when generating explanations.
The original SAGE paper proposes marginalizing out missing features using their conditional distribution. Since this is challenging to implement in practice, several approximations are available. For example, you can:
Two types of explanations can be calculated, both based on Shapley values:
Shapley values are computationally costly to calculate exactly, so we provide several estimation approaches:
PermutationEstimator
).
KernelEstimator
). It's described in this paper, and the Bank notebook shows an example.IteratedEstimator
). This enables faster convergence for features with low variance, but it can result in wider confidence intervals.SignEstimator
, and the Bank notebook shows an example.The results from each approach should be identical (see Consistency), but there may be differences in convergence speed. Permutation sampling is a good approach to start with. KernelSAGE may converge a bit faster, but the uncertainty is spread more evenly among the features rather than being highest for more important features.
Rather than removing features individually, you can specify groups of features to be removed jointly. This will likely speed up convergence because there are fewer feature subsets. See Airbnb for an example.
Ian Covert, Scott Lundberg, Su-In Lee. "Understanding Global Feature Contributions With Additive Importance Measures." NeurIPS 2020
Ian Covert, Scott Lundberg, Su-In Lee. "Explaining by Removing: A Unified Framework for Model Explanation." JMLR 2021
Ian Covert, Su-In Lee. "Improving KernelSHAP: Practical Shapley Value Estimation via Linear Regression." AISTATS 2021
Art Owen. "Sobol' Indices and Shapley value." SIAM 2014