topepo / FES

Code and Resources for "Feature Engineering and Selection: A Practical Approach for Predictive Models" by Kuhn and Johnson
https://bookdown.org/max/FES
GNU General Public License v2.0
716 stars 237 forks source link

Preventing overfitting with supervised encodings: adding noise #33

Open alexpghayes opened 6 years ago

alexpghayes commented 6 years ago

Probably jumping the gun here since the overfitting chapter isn't written yet: I see a lot of Kaggle entries adding small amounts of Gaussian noise to prevent overfitting during feature engineering (in likelihood encodings for example), which feels a bit weird to me.

I'd imagine that averaged-out-of-fold predictions might be more appropriate, but that's more computationally expensive. I'd love to see a discussion comparing these two approaches and when one might be better than the other. I'd also be curious how you might select the amount of noise to add, which seems like it would require a validation set anyway.