linkedin / FastTreeSHAP

Fast SHAP value computation for interpreting tree-based models
BSD 2-Clause "Simplified" License
500 stars 30 forks source link

The additivity check failed with feature_perturbation=“tree_path_dependent” #16

Closed Jing25 closed 1 year ago

Jing25 commented 1 year ago

When I calculate the shap values with feature_perturbation=“tree_path_dependent”, I got this error:

Exception: Additivity check failed in TreeExplainer! Please ensure the data matrix you passed to the explainer is the same shape that the model was trained on. If your data shape is correct then please report this on GitHub. Consider retrying with the feature_perturbation='interventional' option. This check failed because for one of the samples the sum of the SHAP values was 2.279799, while the model output was 4.452977. If this difference is acceptable you can set check_additivity=False to disable this check.

I checked the expected_value, and it's 0:

image

I tested with the SHAP library (https://shap.readthedocs.io/en/latest/index.html), and it also shows that the expected value is 0. I have no idea why the expected value became 0 in their library but I guess it is the reason for this bug.

FastTreeShap version: 0.1.3 SHAP version: 0.41.0

jlyang1990 commented 1 year ago

Does the additivity check failure issue also exist when running SHAP on the same dataset?

Jing25 commented 1 year ago

No. It's interesting that when using SHAP with feature_perturbation=“tree_path_dependent”, the expected value before computing any shap values for a dataset, is different from the expected value after the computation. That means, I got the expected value == 0 before doing any computation, but after I run it on a dataset, the expected value became 2.155. This holds for the previous version. In the previous version (0.38.0 and 0.40.0), I got the expected value ==2.17 before any computation and 2.155 after the computation. I don't know why this is the case.

jlyang1990 commented 1 year ago

Sorry I'm a bit confused. Did you say that in the previous version (0.38.0 and 0.40.0), you got the expected value ==2.17 before any computation and 2.155 after the computation, and in the current version (0.41.0), you got the expected value ==0 before any computation and 2.155 after the computation? If so, which version does FastTreeSHAP match? Thanks!

Jing25 commented 1 year ago

Sorry I'm a bit confused. Did you say that in the previous version (0.38.0 and 0.40.0), you got the expected value ==2.17 before any computation and 2.155 after the computation, and in the current version (0.41.0), you got the expected value ==0 before any computation and 2.155 after the computation?

That is correct. Sorry to make it confusing by mixing with different versions. This inconsistency is only observed in SHAP, not in FastTreeSHAP.

If so, which version does FastTreeSHAP match?

The expected value of FastTreeSHAP 0.1.3 matches the latest version SHAP (0.41.0), which is 0. Then the error came out when I try computing the shap values for a dataset.

The older version works fine (for example, 0.1.1), because the expected value is 2.17, which matches the older version SHAP.

jlyang1990 commented 1 year ago

I see. The only difference between FastTreeSHAP v0.1.3 and v0.1.2 is that the restriction of numpy version (numpy<1.22) has been removed (https://github.com/linkedin/FastTreeSHAP/commit/41a33b628354db3409cefb9bb45184bddac1dfcf). The same thing has been done when SHAP upgraded its own version to v0.41.0 (https://github.com/slundberg/shap/commit/23081f519a8cf9cf0c38efaab90f279f13f44671). We believe that this additivity check failure issue was brought up by this restriction removal (see https://github.com/linkedin/FastTreeSHAP/issues/15 for more details).

Since FastTreeSHAP library aims to reproduce the results from SHAP library in a more efficient way, we would like to wait until SHAP library implements effective resolutions to fix this issue. In the meantime, I would recommend you to switch to FastTreeSHAP v0.1.2 to bypass this issue.