linkedin / FastTreeSHAP

Fast SHAP value computation for interpreting tree-based models
BSD 2-Clause "Simplified" License
500 stars 30 forks source link

Parallelism not working when model_output="logloss" #17

Closed DonIvanCorleone closed 1 year ago

DonIvanCorleone commented 1 year ago

Hi there,

first: Many thanks for that cool project. We played around with it and we got several nice little performance bumps. Pretty nice.

Unfortunately, we realized that in cases when we set model_output="logloss" the parallelism was not working anymore and all outputs of any algorithm [v0, v1, v2] are nearly the same (+/- a couple of seconds)in terms of speed. Is this intended behaviour?

Cheers

jlyang1990 commented 1 year ago

Hi,

Thanks for using this package and glad to see the performance bumps. For your question: Currently model_output="logloss" is only supported when feature_perturbation="interventional" (see https://github.com/linkedin/FastTreeSHAP/blob/master/fasttreeshap/explainers/_tree.py#L116, https://github.com/slundberg/shap/blob/master/shap/explainers/_tree.py#L89). Since FastTreeSHAP algorithm (parallelism, v1, v2) is currently only implemented when feature_perturbation="tree_path_dependent", it does not bring up computational speedups when setting model_output="logloss" unfortunately. Hope this answers your question.

DonIvanCorleone commented 1 year ago

Many thanks for your quick & detailed answer! This helps a lot. Maybe someday in the future feature_perturbation="interventional" will be supported as well? ;)

Closing the ticket. Cheers

alizia commented 1 year ago

@jlyang1990 thank you for this very useful package!

May I ask - are you planning to implement FastTreeSHAP for feature_perturbation="interventional"? If so, do you have a timeline in mind! Many thanks!