tidymodels / recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
https://recipes.tidymodels.org
Other
569 stars 112 forks source link

Request `step_impute_random()` #746

Open EmilHvitfeldt opened 3 years ago

EmilHvitfeldt commented 3 years ago

A recipe step that imputes using random values of the non-missing data. The way I see it, it is on the other side of the variance/bias tradeoff compared to step_impute_mean().

Bonus: this would work on all types of variables, not just numeric.

EmilHvitfeldt commented 3 years ago

I realize that some caution has to be taken for this step to work on applying the changes to the testing data set since it needs to retain the distribution of values in the training data set which can have a high cardinality for continuous data.