conditional subgroup feature importance

asheetal commented 1 year ago

Hi @mayer79

I am curious how I can accomplish this article using your package. https://arxiv.org/abs/2006.04628

Here is my scenario

1000 predictors, a lot of whom are correlated with each other, However, I do not know which ones. There is no way for me to manually group them.
I need to generate top-5 predictors for a field study.
everybody tells me simple light_importance is wrong approach for my problem, but I do not have a guidance on what is correct way to generate top-5 from 1000 predictors

Hope you can advice. It does not need to be the above article. I want to hear your approach.

mayer79 commented 1 year ago

There is no theoretic definition of "feature importance", so there is no "best" way. But maybe some methods might produce more informative results than others in your situation.

If you would know the groups, then you could:

Use the permutation importance function in Christoph's "iml" package. There, you can pass a list of feature groups that should be permuted together. This is a feature that would be nice to have in "flashlight".
Calculate SHAP importance of feature groups (row-wise summing up the SHAP values of each feature group, then take the mean absolute value per column). This has the disadvantages being an in-sample measure of importance, while 1. would produce an out-of-sample result if done on an independent data set.

If you do not know the groups, as an alternative to Christoph's approach, I think you could use a variable clustering algorithm to build the groups and then use any of above methods.

asheetal commented 1 year ago

Thanks @mayer79 I think I now know how to argue this problem. Appears there is no clean solution to this problem and requires a secondary analysis.

Thanks a lot for enlightening me.

mayer79 / flashlight

conditional subgroup feature importance #51