mayer79 / flashlight

Machine learning explanations
https://mayer79.github.io/flashlight/
GNU General Public License v2.0
22 stars 4 forks source link

conditional subgroup feature importance #51

Closed asheetal closed 1 year ago

asheetal commented 1 year ago

Hi @mayer79

I am curious how I can accomplish this article using your package. https://arxiv.org/abs/2006.04628

Here is my scenario

Hope you can advice. It does not need to be the above article. I want to hear your approach.

mayer79 commented 1 year ago

There is no theoretic definition of "feature importance", so there is no "best" way. But maybe some methods might produce more informative results than others in your situation.

If you would know the groups, then you could:

  1. Use the permutation importance function in Christoph's "iml" package. There, you can pass a list of feature groups that should be permuted together. This is a feature that would be nice to have in "flashlight".
  2. Calculate SHAP importance of feature groups (row-wise summing up the SHAP values of each feature group, then take the mean absolute value per column). This has the disadvantages being an in-sample measure of importance, while 1. would produce an out-of-sample result if done on an independent data set.

If you do not know the groups, as an alternative to Christoph's approach, I think you could use a variable clustering algorithm to build the groups and then use any of above methods.

asheetal commented 1 year ago

Thanks @mayer79 I think I now know how to argue this problem. Appears there is no clean solution to this problem and requires a secondary analysis.

Thanks a lot for enlightening me.