Open oldoc63 opened 1 year ago
We can calculate the fitted values using .predict()
by passing in the original data. The result is a pandas series containing predicted values for each person in the original dataset:
fitted_values = results.predict(body_measurements)
print(fitted_values.head())
0 66.673077
1 59.100962
2 71.721154
3 70.711538
4 65.158654
dtype: float64
The residuals are the differences between each of these fitted values and the true values of the outcome variable. They can be calculated by subtracting the fitted values from the actual values. We can perform this element-wise subtraction in Python by simply subtracting one Python series from the other, as shown below:
residuals = body_measurements.weight - fitted_values
print(residuals.head())
0 -2.673077
1 -1.100962
2 3.278846
3 -3.711538
4 2.841346
dtype: float64
script.py
already contains the code to fit a model on the students dataset that predicts test score using hours_studied as a predictor. Calculate the fitted values for this model and save them as fitted_values.residuals
.
There are a number of assumptions of simple linear regression, which are important to check if you are fitting a linear model. The first assumption is that the relationship between the outcome variable and predictor is linear (can be described by a line). We can check this before fitting the regression by simply looking at a plot of the two variables.
The next two assumptions (normality and homoscedasticity) are easier to check after fitting the regression. But first, we need to calculate two things: fitted values and residuals.
Again consider our regression model to predict weight based on height (model formula 'weight ~ height). The fitted values are the predicted weights for each person in the dataset that was used to fit the model, while the residuals are the differences between the predicted weight and the true weight for each person.