predict-idlab / powershap

A power-full Shapley feature selection method.
Other
200 stars 19 forks source link

All relevant feature or minimal set feature selection? #44

Closed rictoo closed 1 year ago

rictoo commented 1 year ago

Hey all, I know Boruta is an "All relevant feature selection" method vs. e.g., mRMR which aims to find a minimum optimum set of features. This is described here.

I'm just wondering if Powershap is an all relevant feature selection method or a minimum optimum set feature selection method? Thanks!

JarneVerhaeghe commented 1 year ago

Hi @rictoo!

We discussed this a bit in our powershap paper. Powershap is built on comparing every feature to a random feature to determine whether it is informative. Therefore the design idea is the "All relevant feature selection" such as Boruta. However, because we use a wrapper method and therefore train a model, the final feature subset is susceptible to feature interactions which could result in a smaller subset of features that are optimal enough for the model. If you are fully interested in all relevant features, we advise you to use the _forceconvergence mode of Powershap. So concluding: A single powershap execution will be a combination of the minimum optimum set and all relevant features, while a _forceconverge will try to find all relevant features.

I hope this answers your question?