Closed glemaitre closed 2 years ago
@glemaitre I think you meant "on correlated features".
Thanks, I corrected the title.
Actually: "TreeSHAP does not exact SHAP values" => "TreeSHAP does not compute exact SHAP values"
Thanks @glemaitre for pointing it out. The primary goal of this FastTreeSHAP
package is to develop a fast implementation of the TreeSHAP algorithm (feature_perturbation="tree_path_dependent") that reproduce the results from the original implementation of the TreeSHAP algorithm (feature_perturbation="tree_path_dependent") in the SHAP package. For those users who are currently implementing the TreeSHAP algorithm (feature_perturbation="tree_path_dependent") in SHAP package, we aim to provide them with a more efficient way of implementing it.
The potential issue you pointed out is relevant to the mathematical details of the TreeSHAP algorithm in the original paper, therefore we think it is more appropriate to discuss this issue with the original authors of TreeSHAP. If the original authors update the TreeSHAP algorithm (feature_perturbation="tree_path_dependent") in the SHAP package, we will update our implementations on our side to make sure they produce exactly the same results.
Just to give some feedback here. The issue with the tree_path_dependent
approach has been discussed in the literature, e.g. Janzing et al.
The only way to not break the symmetry axiom is to perturb features in a marginal manner and not a conditional one. So I assume that there is nothing that can be done here.
Yes that's correct.
As highlighted in https://github.com/slundberg/shap/issues/2345, it seems that TreeSHAP does not compute the expected SHAP values.
The following notebook reproduces the problem with the
FastTreeSHAP
implementation: https://nbviewer.org/gist/glemaitre/9a30dd3a704675164b84d9bf7128882eNote that it is not only a problem in the implementation but rather a problem in algorithm 1 of the original TreeSHAP paper (i.e. tree traversal)