tidymodels / recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
https://recipes.tidymodels.org
Other
565 stars 112 forks source link

Feature Request: step_ function for matrix indexing #1029

Open luisDVA opened 2 years ago

luisDVA commented 2 years ago

Feature

This came up last week during the tidymodels workshop and Max suggest that I open an issue.

Sometimes a dataset contains a mix of qualitative character variables and dummy encoded variables. If we need to homogenize the data, a step_ function for this may be useful. Something that uses tidyselect for var selection and takes the name of the feature being described.

For example, going from this:

species arboreal terrestrial
sp a 0 1
sp b 1 0
sp c 1 0

to this:

species locomotion
sp a terrestrial
sp b arboreal
sp c arboreal

There are many ways to implement this, I have a silly write up here but a base approach would be better.

EmilHvitfeldt commented 2 years ago

I like it, it is basically a reverse step_dummy()