jacobkap / fastDummies

The goal of fastDummies is to quickly create dummy variables (columns) and dummy rows.
https://jacobkap.github.io/fastDummies/
Other
36 stars 9 forks source link

Frequency-based Variable Dropping #8

Closed S-UP closed 6 years ago

S-UP commented 6 years ago

Hi,

Typically, we would want to exclude the one dummy that stands for the most frequently observed category.

E.g. if we have 'small', 'medium' and 'large' while medium being the shirt size 80 percent of the population is wearing, then one typically drops the 'medium' dummy in a regression to have the regression showing the typical situation and not an outlier.

Would be handy to have a feature in place that allows dropping not just the first but the most frequent category. Should be fairly simple to achieve. But would be neat if integrated directly in the package.

Thanks for considering!

jacobkap commented 6 years ago

Thanks for the suggestion. Feature added and updated package will be submitted to CRAN tonight.