oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

Normality and homoscedasticity #472

Open oldoc63 opened 1 year ago

oldoc63 commented 1 year ago

Once we've calculated the fitted values and residuals for a model, we can check the normality and homoscedasticity assumptions of linear regression.

Normality assumption

The normality assumption states that the residuals should be normally distributed. To check this assumption, we can inspect a histogram of the residuals and make sure that the distribution looks approximately normal (no skew or multiple "humps").

plt.hist(residuals)
plt.show()

When residuals appear normally distributed, that lead us to conclude that the normality assumption is satisfied.

If the plot instead looked something like the distribution below (which is skewed right), we would be concerned that the normality assumption is not met:

Image

oldoc63 commented 1 year ago

Homoscedasticity assumption

Homoscedasticity is a fancy way of saying that the residuals have equal variation across all values of the predictor variable. A common way to check this is by plotting the residuals against the fitted values.

plt.scatter(fitted_values, residuals)
plt.show()

If the homoscedasticity assumption is met, then this plot will look like a random splatter of points, centered around y=0.

Image

oldoc63 commented 1 year ago

If there are any patterns or asymmetry, that would indicate the assumption is not met and linear regression may not be appropriate.

oldoc63 commented 1 year ago
  1. Your code to calculate the residuals and fitted values for the model of score predicted by hours studied is provided for you. Plot a histogram of the residuals to check the normality assumption.
oldoc63 commented 1 year ago
  1. Now, check the homoscedasticity assumption by plotting the residuals against the fitted values (fitted_values on the x-axis and residuals on the y-axis).