Closed cgoo4 closed 2 years ago
Hello, sorry for the delayed response - only just getting back to open-source work. So, I think that you are wanting step_select_vip
to only be applied to the categorical variable (before it is encoded) not the individual categories that have been transformed into separate binary variables?
If this is correct, then a variable importance based filter method is problematic because if a model requires categorical variables to be encoded, then the feature importance scores will always include the individual dummy variables.
I can only provide some info on how I perform with when using other libraries, for example scikit-learn or mlr3. For these, I would use a wrapper method when the selection is based on permutation importance. The one hot encoding would be wrapped into a pipeline with the learner model, and this would go inside the permutation method, so that the permutation scores represent the individual variables prior to one hot encoding.
step_dummy
followed bystep_select_vip
forall_predictors
results in thetop_p
predictors plus the dummy variables. Is it possible to include the dummy variables in thetop_p
?