grf-labs / grf

Generalized Random Forests
https://grf-labs.github.io/grf/
GNU General Public License v3.0
958 stars 250 forks source link

Add support for SHAPR #986

Open NicolasWoloszko opened 3 years ago

NicolasWoloszko commented 3 years ago

It would make a lot of sense, and currently SHAPR (= the R package for interpretability using Shapley Values) does not support causal forests.

See for instance a nice application in this IMF working paper.

erikcs commented 3 years ago

Thanks for the suggestion @NicolasWoloszko. At the moment this is not something we have devoted much thought to, so a prototype would be welcome if you are interested. For example, I am not sure how SHAP algorithms tailored for random forests that aggregate tree predictions would extend to GRF, which uses them to construct kernel weights. Another entry point could be a single tree prediction through find_best_tree (currently semi-stale issue).

samkodes commented 3 years ago

edited Just browsing - so half a thought. I wonder if the SHAPR "Empirical Conditional Distribution Approach/Conditional Inference Tree Approach" could use an adapted version of the kernel weighting already constructed by a GRF. The issue would be restricting the conditioning to the subset of variables S conditioned on. For a given tree in the forest would it be legitimate to do this by dropping the test point X down both* branches "in parallel" at any split where the split variable is not in S - and counting all points in all resulting leafs as hits? This would preserve some of the structure learned by the GRF. The counts for each tree could be combined to get overall counts for the forest. Computationally this might blow up but the TreeSHAP algorithm may have suitable tricks (the docs suggest TreeSHAP works similarly).