[KDD question] Shap values vs Sobol indices?

ngbrenda commented 3 years ago

How do Shapley values compare to Sobol indices from UQ? Shapley computes f(x)-E[f(x)] while Sobol decomposes output variance into fractions contributed by the inputs.

slundberg commented 3 years ago

Great question. The is a close connection, but important differences.

One issue is that Sobol is focused on a global measure of a feature's importance (the variance explained), while Shapley values as applied in SHAP focus on local explanations of each prediction. (you could combine many local explanations of the variance of the model to get a global summary of course, but the reverse is not true).
The second issue is that Sobol indicies rely on a regular conditional expectation, while the current SHAP package deals primarily with interventional expectations (see https://arxiv.org/abs/1910.13413 for details). This means Sobol could highlight features that are not used by the model, but are correlated with features the model uses.
For a more in depth review of Sobol and Shapley value connections check out Art Owen's work: http://finmath.stanford.edu/~owen/reports/sobolshapley.pdf

tupui commented 2 years ago

Another important distinction is that Sobol' indices normally assume that the inputs are independent from each other. There are tricks (see works from Sergei Kucherenko for e.g.) but not simple.

github-actions[bot] commented 1 month ago

This issue has been inactive for two years, so it's been automatically marked as 'stale'.

We value your input! If this issue is still relevant, please leave a comment below. This will remove the 'stale' label and keep it open.

If there's no activity in the next 90 days the issue will be closed.

shap / shap

[KDD question] Shap values vs Sobol indices? #1382