oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

Using a regression model for prediction #469

Open oldoc63 opened 1 year ago

oldoc63 commented 1 year ago

Suppose that we have a dataset of heights and weights for 100 adults. We fit a linear regression and print the coefficients:

model = sm.OLS.from_formula('weight ~ height, data=body_measurements)
results = model.fit()
print(results.params)
Intercept -21.67
height 0.50
dtype: float64

This regression allow us to predict the weight of an adult if we know their height. To make a prediction, we need to plug in the intercept and slope to our equation for a line. The equation is:

$$ weight = 0.50 * height - 21.67 $$

To make a prediction we can plug in any height. For example, we can calculate that the expected weight for a 160 cm tall person is 58.33kg:

$$ weight = 0.50 * 160 - 21.67 = 58.33 $$

In Python, we can calculate this by plugging in values or by accessing the intercept and slope from results.params using their indices (0 and 1, respectively):

print(0.50 * 160 - 21.67)
#Output: 58.33

#Or:

print(results.params[1]*160 + results.params[0])
#Output: 58.33
oldoc63 commented 1 year ago

We can also do this calculation using the .predict() method on the fitted model. To predict the weight of a 160 cm tall person, we need to first create a new dataset with height equal to 160 as shown below:

newdata = {"height":[160]}
print(results.predict(newdata))

Output:

0  58.33
dtype: float64

Note that we get the same result (58.33) as with the other methods; however, it is returned as a data frame.

oldoc63 commented 1 year ago
  1. Fit a model that predicts test score using hours_studied. Print the coefficients of this model using .params.
oldoc63 commented 1 year ago
  1. Using your model, what is the predicted score for a student who spent 3 hours studying? Save the result as pred_3h and print it out. Calculate your answer by plugging into the formula for a line (instead of using .predict()).
oldoc63 commented 1 year ago
  1. What is the predicted score for a student who spent 5 hours studying? Use the predict() method to calculate your answer and save it as pred_5hr, then print it out.