Preventing overfitting with supervised encodings: adding noise

Probably jumping the gun here since the overfitting chapter isn't written yet: I see a lot of Kaggle entries adding small amounts of Gaussian noise to prevent overfitting during feature engineering (in likelihood encodings for example), which feels a bit weird to me.

I'd imagine that averaged-out-of-fold predictions might be more appropriate, but that's more computationally expensive. I'd love to see a discussion comparing these two approaches and when one might be better than the other. I'd also be curious how you might select the amount of noise to add, which seems like it would require a validation set anyway.

topepo / FES

Preventing overfitting with supervised encodings: adding noise #33