scikit-learn / blog

Hosting the scikit-learn blog.
https://blog.scikit-learn.org
Creative Commons Attribution 4.0 International
15 stars 23 forks source link

Blog post: comparing SHAP with feature importances #107

Open glemaitre opened 2 years ago

glemaitre commented 2 years ago

As discussed in a developer's meeting, we decided to move the example available here as a blog post.

This issue is intended to keep track of the subject.

glemaitre commented 2 years ago

I have the following notebook that could help in adding some interesting point: https://github.com/glemaitre/trail_seminar/blob/main/notebooks/plot_shap.ipynb

I would avoid speaking about the issue with the SHAP API since we should propose a fix to raise some warnings. However, we should discuss the pitfall at the end of the notebook.

lucyleeow commented 2 years ago

Thanks @glemaitre ! I will have a closer look at your notebooks (also watched your pydata talk, great talk!).

I notice that 'kernel' algorithm is still not exposed in the new SHAP API (though saw https://github.com/slundberg/shap/pull/2452). I am thinking that I will amend my tutorial content such that I only use the tree method and not the kernel method (so I can use the new API). I would like to include Oliviers suggestion: https://github.com/scikit-learn/scikit-learn/pull/18139#issuecomment-1071003528 as well. WDYT, any thoughts @glemaitre ?

glemaitre commented 2 years ago

I notice that 'kernel' algorithm is still not exposed in the new SHAP API

Yes, I did not complete the PR yet there. The project seems really stalled nowadays (I sent some PRs a couple of months ago to fix the CI, but it get reviewed nor merged).

It might be an argument to maybe think about implementing a subset of the SHAP methods at some point.

WDYT, any thoughts @glemaitre ?

Yes, that would be great. I did not get time to work on the tutorial but I would be more than happy to review and even make some pair-programming sessions.

Something that I wanted to investigate is the breakage of the symmetry axiom of the Shapley values with the tree approach and if we can find an example on a real-world dataset where we could exhibit the issue.

lucyleeow commented 2 years ago

Yes, that would be great. I did not get time to work on the tutorial but I would be more than happy to review and even make some pair-programming sessions.

Great, I will make a first draft and we can see!

Something that I wanted to investigate is the breakage of the symmetry axiom of the Shapley values with the tree approach and if we can find an example on a real-world dataset where we could exhibit the issue.

Sounds interesting but beyond my level...and maybe not in the scope of this blog post? Edit: Is this a problem specifically with the tree approximation?