Overfitting - Githubissues

TL;DR

It contains some notes for overfitting.

Link article

https://elitedatascience.com/overfitting-in-machine-learning#overfitting-vs-underfitting

Key takeaways

What?

If our model does much better on the training set than on the test set, then we’re likely overfitting. Overfitting occurs if the model or algorithm shows low bias but high variance (whereas, underfitting occurs if the model or algorithm shows low variance but high bias)

Why?

Too powerful model (models with large exponents (e.g 100-degree polynomial)-> map into multi-dimensional space)
Not enough data: Getting more data can sometimes fix overfitting problems
Too many features: Like figure below:

How prevent?

1. Cross-validation

Use your initial training data to generate multiple mini train-test splits. Use these splits to tune your model. Keep test data is unseen, only use train data for training.

2. Training with more data

It won’t work every time, but training with more data can help algorithms detect the signal better The more data we have, the better our model is generalized (of course, the data must be clean 😁)

3. Remove features

Some algorithms have built-in feature selection. For those that don’t, you can manually improve their generalizability by removing irrelevant input features

4. Early stopping Allows stopping early after a certain number of times that the error (accuracy) of the model has not changed (or changed too little)

5. Regularization

Techniques for artificially forcing your model to be simpler Some famous ones: L1, L2

6. Ensembling

Combining predictions from multiple separate models There are a few different methods for ensembling, but the two most common are:

Bagging attempts to reduce the chance of overfitting complex models. (Trains a large number of "strong" learners in parallel and get the final result by voting from those learners)

Boosting attempts to improve the predictive flexibility of simple models. (Trains a large number of "weak" learners in sequence and combines all the weak learners into a single strong learner)

7. Dropout

Use in neural networks Every unit of our neural network (except those belonging to the output layer) is given the probability p of being temporarily ignored in calculations

ptpuyen1511 / my_notes