shap / shap

A game theoretic approach to explain the output of any machine learning model.
https://shap.readthedocs.io
MIT License
22.51k stars 3.25k forks source link

Adjusting number of features shown #4

Closed coulls closed 6 months ago

coulls commented 6 years ago

When visualizing a single prediction, it appears as though the IML visualizer decides how many feature names to put in the chart. It would be great if we had more control over the features shown. For instance, showing only the top-n features for both the positive and negative direction.

Is it possible to achieve this in the code as it stands or do it require deep changes to IML?

slundberg commented 6 years ago

That would not be a huge change, but it will need to be made in the JS code in IML. If it's important to you let me know, and I'll leave it here as a to-do when I get time. (or a PR to IML is also welcome)

coulls commented 6 years ago

Thanks! I do think it would be super helpful. Our model has nearly 3,000 features, so visualizing the whole range for a single prediction is kind of difficult to read. I have tried to limit the number of features, though obviously that impacts the predicted score that is displayed, etc.

Something that essentially 'zooms in' on the area around the predicted score would be nice.

slundberg commented 6 years ago

Could you attach a quick screen grab of what you are seeing? That would help me narrow down what will fix the layout issue. Currently the method tries to prune labels that don't fit on the plot, but that calculation could be improved.

On Tue, Oct 24, 2017 at 4:59 PM Scott Coull notifications@github.com wrote:

Thanks! I do think it would be super helpful. Our model has nearly 3,000 features, so visualizing the whole range for a single prediction is kind of difficult to read. I have tried to limit the number of features, though obviously that impacts the predicted score that is displayed, etc.

Something that essentially 'zooms in' on the area around the predicted score would be nice.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/slundberg/shap/issues/4#issuecomment-339171703, or mute the thread https://github.com/notifications/unsubscribe-auth/ADkTxTMh9CFn3x98WTQn4XovRUkrqa8oks5svnn4gaJpZM4QFA6N .

coulls commented 6 years ago

Apologies for the delay in getting back to you. Attached are two examples of the single prediction visualization using my ~3,000 features. The predictions are the same sample with the same feature set under two different LightGBM models.

I've anonymized the one displayed feature name, so I hope that doesn't impact anything on your end.

As you can see, there is quite the long-tail of minor contributors in the visualization, so the amount of information that can be retrieved by the viewer at this level of detail is quite low. It would be nice if there was a 'zoom' functionality that let us focus on the top-N features, or something similar to that. As I said before, I tried to do this artificially by recreating a truncated SHAP vector, but then the totals are obviously incorrect -- I suppose I could just sum all of the remaining feature contributions up into a single catch-all feature for both sides of the force diagram, but that seems like something that perhaps should be handled under the hood (and certainly not something that will work in general).

Separately, there have been some issues with placement of the names, though I fully realize part of that is due to their length. In this example, one model's visualization output can only fit a single name and the other didn't place any of the feature names.

screen shot 2017-10-30 at 10 30 02 pm screen shot 2017-10-30 at 10 29 51 pm
slundberg commented 6 years ago

Thanks for sharing! I'll think about how this should work. I have also used this with thousands of features, but for our model individual predictions were usually dominated by only a handful of features so I didn't run into this issue.

On Mon, Oct 30, 2017 at 7:51 PM Scott Coull notifications@github.com wrote:

Apologies for the delay in getting back to you. Attached are two examples of the single prediction visualization using my ~3,000 features. The predictions are the same sample with the same feature set under two different LightGBM models.

I've anonymized the one displayed feature name, so I hope that doesn't impact anything on your end.

As you can see, there is quite the long-tail of minor contributors in the visualization, so the amount of information that can be retrieved by the viewer at this level of detail is quite low. It would be nice if there was a 'zoom' functionality that let us focus on the top-N features, or something similar to that. As I said before, I tried to do this artificially by recreating a truncated SHAP vector, but then the totals are obviously incorrect -- I suppose I could just sum all of the remaining feature contributions up into a single catch-all feature for both sides of the force diagram, but that seems like something that perhaps should be handled under the hood.

Separately, there have been some issues with placement of the names, though I fully realize part of that is due to their length. In this example, one model's visualization output can only fit a single name and the other didn't place any of the feature names.

[image: screen shot 2017-10-30 at 10 30 02 pm] https://user-images.githubusercontent.com/3245674/32205199-5521f3a2-bdc4-11e7-8b69-f33db8af2c79.png

[image: screen shot 2017-10-30 at 10 29 51 pm] https://user-images.githubusercontent.com/3245674/32205209-60578ade-bdc4-11e7-8f6e-d7d7d76c2652.png

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/slundberg/shap/issues/4#issuecomment-340644747, or mute the thread https://github.com/notifications/unsubscribe-auth/ADkTxfU7MBO1RV7jJtsPjkz2OA5W3uTfks5sxotCgaJpZM4QFA6N .

github-actions[bot] commented 9 months ago

This issue has been inactive for two years, so it's been automatically marked as 'stale'.

We value your input! If this issue is still relevant, please leave a comment below. This will remove the 'stale' label and keep it open.

If there's no activity in the next 90 days the issue will be closed.

github-actions[bot] commented 6 months ago

This issue has been automatically closed due to lack of recent activity.

Your input is important to us! Please feel free to open a new issue if the problem persists or becomes relevant again.