macss-modeling / General-Questions

A repo to post questions about code, data, etc.
0 stars 0 forks source link

Overfitting: fitting to the noise of training data or increasing variance of the model? #3

Closed ddlee19 closed 3 years ago

ddlee19 commented 3 years ago

Hello,

I had a question about overfitting. When the training MSE decreases and the testing MSE increases, it is said that the reason for this is due to the variance of the model increasing as the flexibility of the model increases. However, I was curious about the role of fitting to the noise of the training data had in overfitting. Is "fitting to the noise of the training data" the reason that the variance of the model increases or does the variance increase because the model is becoming more flexible? I am confused about what "fitting to the noise of the training data" exactly means and what role it plays in the bias-variance tradeoff.

Thanks, Daniel

bjcliang-uchi commented 3 years ago

Hi Daniel, when a model becomes very 'flexible," it can better capture the non-linear effects (think about a quadratic model vs a linear model) but tends to be very sensitive to the minor (and likely) random fluctuations in the training data. As it becomes too flexible, it tries too hard to explain every single change in the training data and thus is likely to have learned too much of (i.e. overfits) the noise. A highly flexible model tends to help us reduce the bias at the cost of increasing the variance.