iancovert / sage

For calculating global feature importance using Shapley values.
MIT License
255 stars 34 forks source link

All negative SAGE values in classification #20

Closed KarelZe closed 1 year ago

KarelZe commented 1 year ago

Dear @iancovert,

I perform binary classification using CatBoost and try to determine with GroupedMarginalImputer and PermutationEstimator, but I only obtain negative SAGE values.

My test set X_test is relatively large (9861576, 15)I provide the imputer with a random sample X_importance of the test set of size (512, 15) similar to the notebook. My labels are [-1,1], if relevant.

My code

 # ...
    # use callable as catboost is already with pool incl.  categoricals
    def call_catboost(X):
        if feature_str == "ml":       
            X = pd.DataFrame(X, columns=X_importance.columns)
            # Update the selected columns in the original DataFrame
            X[cat_features] = X.iloc[:, cat_idx].astype(int)
            # pass cat indices
            return clf.predict_proba(Pool(X, cat_features=cat_idx))
        else:
            return clf.predict_proba(X) # <- used here

    # apply group based imputation + estimate importances in terms of cross-entropy
    imputer = GroupedMarginalImputer(call_catboost, X_importance, groups)
    estimator = PermutationEstimator(imputer, "cross entropy")

    # calculate values over entire test set
    sage_values = estimator(X_test.values, y_test.values)

    # save sage values + std deviation to data frame
    result = pd.DataFrame(index=group_names, data={"values": sage_values.values, "std": sage_values.std})

Obtained Result: 512 background samples: image

256 background samples: image

As visible in the screenshots, all SAGE values are negative. I noticed that if I decrease the subset passed to GroupedMarginalImputer to only 256 samples one group is slightly positive. Uncertainties are relatively low.

To my understanding such a result would imply, that all features (or groups) contribute negatively to the loss. This is somewhat counter-intuitive to me, as the classifier has a strong performance on the test set. I've seen the notebooks / examples in the SAGE paper, where single corrupted features degrade performance, but not every feature.

Is there some implementation error on my side e.g., misunderstanding of background samples or is such a result plausible?

Thanks for your help.

PS: A similar issue was discussed in #2, but it seems to be resolved.

PPS: I also ran the experiment with my implementation of the zero one loss (see #18) but the output is similar.

iancovert commented 1 year ago

You're right that there must be some bug causing this result. Negative SAGE values should only occur when a feature is hurting the model's performance, so it cannot happen for all your features given that the model performs well.

I had to refresh my memory on a couple details from the implementation, but here are some of the important details:

Can you think of any reason why this fix wouldn't be working properly? Or would you be able to try manually changing your label encoding? Let me know what you think. It's also possible the error is unrelated to the label encoding, but this seems like a good place to start.

KarelZe commented 1 year ago

Dear @iancovert,

thanks for your response.

I try to post my solution if there is any. Otherwise I think we can close this issue for now.

Thanks in advance.

iancovert commented 1 year ago

This is helpful information, I forgot that you're using some classifiers that output only hard probabilities. In that case, we won't be able to calculate the cross-entropy loss even given the predict_proba output: the reason is that any incorrect prediction will result in infinite cross-entropy loss, because the probability mass on the correct class is zero. The cross-entropy loss requires probabilistic predictions (this is why we try to use predict_proba whenever possible when preparing the model), but we'll need to avoid making that assumption here.

So one thing is for sure, we'll need to use your zero-one loss. The next question is why even those results look strange so far. Looking at your implementation here, here's what I'm observing:

With that in mind, I'm thinking the issue could be what you mentioned about the model's accuracy vs a random guess: if the accuracy we observe (0.65) is less than what we would get by guessing the most likely class (which could be > 0.65 depending on the class imbalance), then it would make sense to observe negative SAGE values. The way to think about this: if your model had no input features, it would roughly guess the true class proportions (say 0.2 and 0.8), and it would get accuracy of 0.8. The SAGE values should sum to 0.65 - 0.8 = -0.15, so it would make sense for many or all of them to be negative.

Let me know what you think.

KarelZe commented 1 year ago

Dear @iancovert,

thanks for your response. Your response made go through the zero-one loss again and I think we finally solved it :tada:.

What I didn't try previously is changing the labels to [0,1] instead of [-1,1] when paired with ZeroOneLoss. I only tested for the cross-entropy loss. I'll have to verify / investigate why targets are not converted automatically here and see where it fails exactly.

The SAGE Values look rather reasonable if I clip the values to [0, 1] using y_test.clip(0) for different models: image

To the other points you raised:

Thanks for your help. I'll consider the point when preparing the PR for zero-one loss and investigate it a bit further.

iancovert commented 1 year ago

Awesome, I'm relieved we figured this out. Let me know if you figure out where the label conversion went wrong, hopefully it would require only a small fix. And looking forward to the PR!