Open oldoc63 opened 1 year ago
score
: student score on a quizcompleted
: the number of other content items on Codecademy that the learner has completed prior to this quizlesson
: indicates which lesson the learner took directly before the quiz (Lesson A
or Lesson B
)Take a look at this dataset by printing the first five rows.
score
(y-axis) against completed
(x-axis) to see the relationship between quiz score and number of completed content items. Make sure to show, then clear the plot. Is there a relationship between these two variables, and does it appear to be linear?
Use plt.scatter() to create a scatter plot. The firs argument is the x-variable (codecademy.completed) and the second argument is the y-variable (codecademy.score). After calling plt.scatter(), use two lines of code to show, then clear the plot.
score
using complete
as the predictor. Print out the regression coefficients.The intercept is the expected value of the outcome variable when the predictor variable is equal to 0. The slope is the expected difference of the outcome variable for a one unit increase in the predictor variable.
There are a few different ways to accomplish this, but one option is to use
plt.plot()
to create the line, using the completed column from the original data as the x-coordinates (first argument) and the predicted values of score (based on the model) as the y-coordinates (second argument).
One option is to use the .predic() method on your fitted model and pass in a new dataset with completed = 20
newdata = {completed:[20]}
Another option is to use your equation of a line along with the intercept and slope you calculated when you fit the model. The formula looks something like: slope * 20 + intercept.
fitted_values
.
Use the .predict() method on your fitted model and pass in the data that was used to fit the model.
residuals
.
Subtract the fitted_values that you calculated in the previous from the true student quiz scores (codecademy.score).
Use plt.scatter() to create the scatter plot and pass in fitted_values as the first argument (x-variable) and residuals as the second argument (y-variable).
score
using lesson
as the predictor. Print out the regression coefficients.To calculate and print the mean quiz score for learners who took lesson A, you can use the following code:
print(np.mean(codecademy.score[codecademy.lesson == 'Lesson A']))
You should find that the intercept from the regression output is equal to the mean score for learners who took lesson A, and the slope is equal to the mean difference.
For this project, you'll get to work as a data analyst alongside the curriculum team at Codecademy to help us improve the learner experience. While this data is simulated, it is similar to real data that we might want to investigate as Codecademy team members.