linkedin / FastTreeSHAP

Fast SHAP value computation for interpreting tree-based models
BSD 2-Clause "Simplified" License
512 stars 31 forks source link

TreeSHAP does not exact SHAP values on correlated features #2

Closed glemaitre closed 2 years ago

glemaitre commented 2 years ago

As highlighted in https://github.com/slundberg/shap/issues/2345, it seems that TreeSHAP does not compute the expected SHAP values.

The following notebook reproduces the problem with the FastTreeSHAP implementation: https://nbviewer.org/gist/glemaitre/9a30dd3a704675164b84d9bf7128882e

Note that it is not only a problem in the implementation but rather a problem in algorithm 1 of the original TreeSHAP paper (i.e. tree traversal)

ogrisel commented 2 years ago

@glemaitre I think you meant "on correlated features".

glemaitre commented 2 years ago

Thanks, I corrected the title.

ogrisel commented 2 years ago

Actually: "TreeSHAP does not exact SHAP values" => "TreeSHAP does not compute exact SHAP values"

jlyang1990 commented 2 years ago

Thanks @glemaitre for pointing it out. The primary goal of this FastTreeSHAP package is to develop a fast implementation of the TreeSHAP algorithm (feature_perturbation="tree_path_dependent") that reproduce the results from the original implementation of the TreeSHAP algorithm (feature_perturbation="tree_path_dependent") in the SHAP package. For those users who are currently implementing the TreeSHAP algorithm (feature_perturbation="tree_path_dependent") in SHAP package, we aim to provide them with a more efficient way of implementing it.

The potential issue you pointed out is relevant to the mathematical details of the TreeSHAP algorithm in the original paper, therefore we think it is more appropriate to discuss this issue with the original authors of TreeSHAP. If the original authors update the TreeSHAP algorithm (feature_perturbation="tree_path_dependent") in the SHAP package, we will update our implementations on our side to make sure they produce exactly the same results.

glemaitre commented 2 years ago

Just to give some feedback here. The issue with the tree_path_dependent approach has been discussed in the literature, e.g. Janzing et al.

The only way to not break the symmetry axiom is to perturb features in a marginal manner and not a conditional one. So I assume that there is nothing that can be done here.

jlyang1990 commented 2 years ago

Yes that's correct.