There are a number of Python libraries that can ve used to fit a linear regression, but in this course, we will use the OLS.from_formula() function from statsmodels.api because it uses simple sintax and provides comprehensive model summaries.
Suposse we have a dataset named body_measurements with columns height and weight. If we want to fit a model that can predict weight based on height, we can create the model as follows:
model = sm.OLS.from_formula('weight ~ height', data=body_measurements)
We used the formula 'weight ~ height' because we want to predict weight (it is the outcome variable) using height as a predictor. Then, we can fit the model using .fit():
results = model.fit()
Finally, we can inspect a summary of the results using print(results.summary()). For now, we'll only look at the coefficients using results.params, but the full summary table is useful because it contains other important diagnostic information:
print(results.params)
Intercept -21.67
height 0.50
dtype: float64
This tell us that the best fit intercept is -21.67, and the best fit slope is 0.50.
Using the students dataset that has been loaded in script.py create a linear regression model that predicts student score using hours_studied as a predictor and save the result as a variable named model.
There are a number of Python libraries that can ve used to fit a linear regression, but in this course, we will use the OLS.from_formula() function from statsmodels.api because it uses simple sintax and provides comprehensive model summaries.
Suposse we have a dataset named body_measurements with columns height and weight. If we want to fit a model that can predict weight based on height, we can create the model as follows:
We used the formula 'weight ~ height' because we want to predict weight (it is the outcome variable) using height as a predictor. Then, we can fit the model using .fit():
Finally, we can inspect a summary of the results using print(results.summary()). For now, we'll only look at the coefficients using results.params, but the full summary table is useful because it contains other important diagnostic information:
Intercept -21.67 height 0.50 dtype: float64
This tell us that the best fit intercept is -21.67, and the best fit slope is 0.50.