Closed mattwarkentin closed 3 years ago
In tidymodels/recipes#484 there has been some discussion of handling outliers, perhaps with Tukey's rule or something else. Later in that issue we raised the issue of maybe this being more appropriate to an entirely separate recipes extension package for outlier feature engineering, like themis handles class imbalance and subsampling.
It does seem like there are quite a lot of approaches and it might make sense to have them all together in one package.
Ahh okay, yes I didn't come across that issue in my searches. It probably does make sense to have outlier preprocessing contained in an adjacent package.
Would you like to make one? If so, let us know if you need any help.
I am interested. Under the tidymodels
umbrella?
We would be fine with something that lives in your repo as well as something that sits in the tidymodels org (maintained by you either way). In the latter case, it would be good to keep it in scope with tidymodels (e.g. not unrelated model functions).
There isn't much different based on where it lives. infer and Emil's packages were developed by people outside our our group so you can always ping them to get advice (fyi Emil is joining our team in a few weeks)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.
Feature
I don't think a step like this currently exists, so I wanted to nominate
step_truncate
(could also be calledstep_clip
orstep_clamp
maybe??):The step would truncate numeric variables based on percentiles of the variable distribution. For example, if you wanted to truncate
x
at the 1st and 99th percentile, this would assign the 1st and 99th percentile value to observations that are below and above these values, respectively.Perhaps a step already exists but I didn't see it. If there is interest, I can contribute a PR.