step_select_vip & dummy variables

Hello, sorry for the delayed response - only just getting back to open-source work. So, I think that you are wanting step_select_vip to only be applied to the categorical variable (before it is encoded) not the individual categories that have been transformed into separate binary variables?

If this is correct, then a variable importance based filter method is problematic because if a model requires categorical variables to be encoded, then the feature importance scores will always include the individual dummy variables.

I can only provide some info on how I perform with when using other libraries, for example scikit-learn or mlr3. For these, I would use a wrapper method when the selection is based on permutation importance. The one hot encoding would be wrapped into a pipeline with the learner model, and this would go inside the permutation method, so that the permutation scores represent the individual variables prior to one hot encoding.

stevenpawley / recipeselectors

step_select_vip & dummy variables #10