SHAP broken on DRF binomial classification with sorting (and possibly even without it)

tomasfryda commented 1 year ago

The problem is caused by transformation of the contributions which are computed for $P(Y=0)$ but we are actually interested in contributions to $P(Y=1)$.

The transformation is done by subtracting from a fraction of one as follows:

// in ScoreContributionsSoringTaskDRF::doModelSpecificComputation
          float featurePlusBiasRatio = (float)1 / (_output.nfeatures() + 1); // + 1 for bias term
          contribs[i] = featurePlusBiasRatio - (contribs[i] / _output._ntrees);

This is incorrect since it doesn't satisfy the dummy property - meaning if a variable is not used it should not contribute (contribution == 0) and since we are subtracting zero from a featurePlusBiasRatio we get a non-zero value. (See Steps to reproduce)

A better solution is to use the solution that is present in the ScoreContibutionsTaskDRF::addContribToNewChunk.

float featurePlusBiasRatio = (float)1 / (_output._varimp.numberOfUsedVariables() + 1); // + 1 for bias term
nc[i].addNum(contribs[i] != 0 ? (featurePlusBiasRatio - (contribs[i] / _output._ntrees)) : 0);

However, I'm afraid that might not be the correct solution either.

I'm not sure if it's applicable here but in my PR where I add SHAP with background sample I have to use a different approach:

Let denote $x$ as a data point that I want to know the contributions $\phi_i$ for and $b$ a background data point that I use to calculate the contributions for $x$.

In other words:

$$f(x) = \sum_{i\in\text{features}} \phi_i + f(b)$$

$$\sum_{i\in\text{features}} \phi_i = f(x) - f(b)$$

Since $f(x)$ corresponds to $P(Y=0)$ and we want $P(Y=1)$ we need to transform it somehow:

$$P(Y=1|x) = g(x) = 1-P(y=0|x) = 1 - f(x)$$

Let's denote $\tilde\phi_i$ as contributions for the $P(Y=1)$, then

$$\sum_{i\in\text{features}} \tilde\phi_i = g(x) - g(b) = 1-f(x) - 1 +f(b) = f(b)-f(x)$$

So we can just swap the data point and the background data point to get the contributions for the P(Y=1). This has the nice property that if contribution(a|x,b) shifts the response from f(b) to f(x), then contribution(a|b,x) moves the response by the same magnitude but in the opposite direction (from f(x) to f(b)) which does not happen when using the simple transformation with 1/n - contrib.

Then i have to do little magic to get the correct bias term and for cases when output_format="original" which I implement using one hot encoding I have to move the contribution from the categorical level from $b$ to the categorical level from $x$ to make shap less confusion (and to satisfy the dummy property if we assume we can censor some categorical levels).

The same explanation but from the code in case I forgot to write here something:

/* Sum of contributions + biasTerm (contribs[contribs.length-1]) gives us prediction for P(Y==0) but the user is
interested in knowing P(Y==1) = 1 - P(Y==0).

Since SHAP should satisfy the dummy property - if a feature is not used it should not have any contribution, we 
cannot just do 1/nfeatures - (contribs[i]/ntrees).

In the contribs array we have contributions and BiasTerm the difference between the two is that BiasTerm should
correspond to the prediction of the background data point, hence it doesn't support the dummy property and should
be always involved in the conversion.

Another property that should be satisfied is the following:
 Let's denote contribution of feature a of data point x on background data point b as contribution(a|x,b), then
 contribution(a|x,b) == - contribution(a|b,x). In other words, if contribution(a|x,b) shifts the response
 from f(b) to f(x), then contribution(a|b,x) should move the response by the same magnitude but in the opposite
 direction (from f(x) to f(b)).

This leads to more involved transformation than just subtracting the contributions from fraction of one:

Let
  f(x) = P(Y=0|x) = sum(contr(x|b)) + P(Y=0|b) = sum(contr(x|b)) + f(b)
  sum(contr(x|b)) = P(Y=0|x) - P(Y=0|b) = f(x) - f(b)
but that would give us estimation of P(Y=0|x) but we want P(Y=1|x) so let
  g(x) = P(Y=1|x) = 1 - P(Y=0|x) = 1 - sum(contr(x|b) - P(Y=0|b) = 1 - sum(contr(x|b) - f(b)
  sum(contr(b|x)) = 1 - f(x) - 1 + f(b) = f(b) - f(x) = g(x) - g(b)
From the last line we can see that we can calculate the contributions for the P(Y=1|x) by switching the
data point (x) with the background data point (b).

Then we will also have to recompute the bias term as the bias we get from the interventionalTreeSHAP would
correspond to f(x) but we want g(b).
  bias term == g(b) = P(Y=1|b) = 1 - P(Y=0|b) = 1 - sum(contr(b|x)) - f(x)
*/

double sum = 0;
for (int i = 0; i < contribs.length-1; i++) {
  contribs[i] = (contribs[i] / _output._ntrees);
  sum += contribs[i];
}

contribs[contribs.length-1] = 1 - (contribs[contribs.length-1]/_output._ntrees) - sum;

Steps to reproduce Steps to reproduce the behavior (with working code on a sample dataset, if possible):

library(h2o)
h2o.connect()
df <- as.h2o(attitude)
df$result <- as.factor(df$rating > 50)
mod <- h2o.randomForest(y="result", training_frame = df, ntrees = 1, max_depth = 1)

h2o.predict_contributions(mod, df, top_n = ncol(df))

#   top_feature_1 top_value_1 top_feature_2 top_value_2 top_feature_3 top_value_3 top_feature_4 top_value_4 top_feature_5 top_value_5 top_feature_6 top_value_6
# 1        rating       0.125    privileges       0.125      learning       0.125        raises       0.125      critical       0.125       advance       0.125
# 2    complaints       0.325        rating       0.125    privileges       0.125      learning       0.125        raises       0.125      critical       0.125
# 3    complaints       0.325        rating       0.125    privileges       0.125      learning       0.125        raises       0.125      critical       0.125
# 4    complaints       0.325        rating       0.125    privileges       0.125      learning       0.125        raises       0.125      critical       0.125
# 5    complaints       0.325        rating       0.125    privileges       0.125      learning       0.125        raises       0.125      critical       0.125
# 6        rating       0.125    privileges       0.125      learning       0.125        raises       0.125      critical       0.125       advance       0.125
# top_feature_7 top_value_7 BiasTerm
# 1    complaints      -0.425   -0.075
# 2       advance       0.125   -0.075
# 3       advance       0.125   -0.075
# 4       advance       0.125   -0.075
# 5       advance       0.125   -0.075
# 6    complaints      -0.425   -0.075
# 
# [30 rows x 15 columns] 

h2o.predict_contributions(mod, df)
#   rating  complaints privileges learning raises critical advance BiasTerm
# 1      0 -0.05000001          0        0      0        0       0      0.3
# 2      0  0.70000005          0        0      0        0       0      0.3
# 3      0  0.70000005          0        0      0        0       0      0.3
# 4      0  0.70000005          0        0      0        0       0      0.3
# 5      0  0.70000005          0        0      0        0       0      0.3
# 6      0 -0.05000001          0        0      0        0       0      0.3
# 
# [30 rows x 8 columns]

valenad1 commented 1 year ago

Good one, I fixed only the binomial version that time.

valenad1 commented 1 year ago

Do I understand right that the shapley for DRF are totally wrong, if one of the feature is not used?

tomasfryda commented 1 year ago

On a walk with my dog, I thought about it a little bit more and I think I used way too much complicated approach - I got confused by every contribution being subtracted from a constant.

I will test it tomorrow more thoroughly but I think this is how it should be done:

        for (int i = 0; i < contribs.length-1; i++) {
          contribs[i] = -(contribs[i] / _output._ntrees);
        }
        contribs[contribs.length-1] = 1 - (contribs[contribs.length-1]/_output._ntrees);

Why?

Let me show an example for the SHAP with the background sample (but I think it's transferable to the original TreeSHAP):

$$P(Y=0|x) = 0.3$$

$$P(Y=0|b) = 0.45$$

$$ \sum\phi_i =P(Y=0|x) - P(Y=0|b) = -0.15$$

Here ^^^ we call the $P(Y=0|b)$ "bias term" and here it should be obvious that it corresponds to the prediction of the background sample.

If we rewrite it we can see that the sum "lifts" the prediction from the background sample prediction to the prediction for the point that we calculate the SHAP for.

$$P(Y=0|x) = \sum\phi_i - P(Y=0|b)$$

Now if we are interested in the contributions to P(Y=1|x) we can calculate those probabilities like:

$$P(Y=1|x) = 1-P(Y=0|x) = 0.7$$

$$P(Y=1|b) = 1- P(Y=0|b) = 0.55$$

Now the important part comes in:

$$P(Y=1|x) - P(Y=1|b) = 1- P(Y=0|x) - 1 + P(Y=0|b) = - (P(Y=0|x) - P(Y=0|b)) = - \sum\phi_i = 0.15$$

So the contributions for $P(Y=1|x)$ sum up to the negative value of contributions for $P(Y=0|x)$.

And the second important thing is that the bias term now should corresponds to $P(Y=1|b) = 1 - P(Y=0|b)$.

So I think the only place where we should subtract from a constant is the bias term.

Visually it corresponds roughly to the following figure: shap

To sum it up, I think all binomial DRF SHAP values are calculated wrong. And when we fix this there will still be https://github.com/h2oai/h2o-3/issues/7385 which I think is caused by wrongly calculated weights.

h2oai / h2o-3

SHAP broken on DRF binomial classification with sorting (and possibly even without it) #15657