Open dberenbaum opened 1 year ago
Yes, may be can come with some other plots that can be reasonable for this workflow instead of removing the feature importance. Anything that comes to your mind @dberenbaum @daavoo ?
At the end it would be nice to have more plots I think.
We could plot the distribution of samples across target labels (0, 1) and/or splits (train, test).
Those are usually represented as bar plots and would be associated with a different stage (prepare
?)
I would prefer to convert feature importance to a bar plot since we have support for it, and then add another image plot.
One idea is a SHAP summary plot, which is a more robust feature importance method:
It doesn't hurt to also keep the traditional feature importance as a bar plot since all of these methods have pros and cons, and it's can help to look at more than one method.
Okay, sounds good, we can try both. I like @daavoo 's suggestion since it's way simpler. I would add another image too though, I think it's good to have more images.
Let's take this when we are done with the global/flexible plots iteration and https://github.com/iterative/example-repos-dev/pull/117 is merged?
We can prioritize this in docs planning that @jorgeorpinel is preparing as a task that one of the people from the bigger "docs" group can take (including me, I would be happy to do this).
I started on the SHAP one in https://github.com/iterative/example-repos-dev/pull/136, so anyone can feel free to pick up from there. There's a SHAP package, so it's not difficult to add.
Having some sample distribution plot is a good idea, although I have a couple concerns about the suggested bar plot:
A histogram of predictions from training and test data might be another good bar plot.
Bar charts were added in https://github.com/iterative/dvc-render/issues/8. Should we switch the feature importance plot from image to bar plot? I'm not sure it's worth it since then we will have no static image plots.