Closed mistryrohan closed 10 months ago
Some concerns/comments:
Added STA130_F21_songrecommendations.csv to the data folder.
Lots of detailed comments from me orienting myself to things here and responding to your comments, as usual; but, just work through them one by one as you've done (well) in the past :)
One more comment to add to those above: model building with p-values is the usual statistical approach; so, this needs to be presented/contrasted with the RMSE train/test idea as well.
hw8_tester
SettingWithCopyWarning
which will manifest later on if not addressed herenp.select
?I will move over to work on Matthew's PR and the course project so that I can share with you what that's going to be. That I think will inform your thinking about if we can include and use some of that in the homework and/or tutorial; and, anyway, how we could orient our materials here to help the students prepare and be ready for what's asked of them for the final project.
I do quite like how this homework is shaping up... I imagine you'll next move into interactions...
formula='total_pr ~ C(seller_rating_tier)'
style approach of Q4/Q5; although, you'll see in my hint to Q5 how baseline and constrat choices could be specified in a simply hacky kind of way.Thought it would be a good mental exercise to get a draft of the tutorial assignment question before going to bed. Here it is (also am still working on the tutorial slides):
As a first-year student exploring the vast amounts of opportunities university has to offer, you decide to join the basketball team (a friendly reminder to get involved in extracurriculars and events!). The coaches get to know you more and find out that you are studying statistics. Since the team is currently training for a provincial competition, the coaches have been collecting significant amounts of data and want to analyze the key factors influencing the team's performance. The coaches have a breadth of numerical data on shots, rebounds, assists, player experience, and player sleep. Also, they have categorical data on pre-game routines, off-court practice, health history, and player nutrition. They believe the more complicated model will allow them to fix all sorts of small issues in their team to help them perform at their best.
You explain how you have learned about multiple-linear regression and techniques on creating a reliable model. The coaches only know a little about simple-linear regression and are interested in learning your process in creating and selecting an appropriate model. Your task is to show an overview of this process, including the practical implications of your potential findings and what the coaches can do to support their players. You should write down some hypothetical equations, explain any transformations needed in the data, and the differences between simple and multiple-linear regression. Do not be afraid to use technical statistical terms, but be sure to explain their meaning in simple and understandable ways that would help non-statistical audience made sense of what you're taking about.
# test_Q0
# test_Q9
: makes sure these are present; otherwise, MarkUs doesn't pick them up and show them!
cond
takes values "new"
or 1
if the game is new and "used"
or 0
if the game is used.~ so I've fixed that to match your provided solutionI'm pausing here to comment that this whole sequence is outstanding. This is exactly the way I want these homework assignments to go... this really helps guide the students through the use and concepts of things here... just really fantastic
# test_Q14
or whatever you create won't show up in MarkUs# test_Q15
needed for MarkUs visibility/processing# test_Q16
and please make this an actual test that checks that the update has been made based on ...=="minor").sum
or something like that
# test_Q17
and make this an autotest by checking the intercept and slope coefficientsplotly.graph_objects
..." is enough to get students to know you're trying to ask them to make a y=x line....summary()
?# test_Q19
...I'm liking where this all seems to be going, but/and, I have a couple comments of what I'm hoping/expecting to see:
using p-value significance/evidence to evaluate models
dovetailing back to Q18 to see the y v y-hat, R^2, residuals, stuff still holds and can still be used in the same way as a simple linear regression model (which I think will be excellent to demonstrate and reinforce in this manner)
Q20: great -- so now this needs to be rebuilt to mirror Q18. Help walk the students through what you're trying to get them to notice and understand in this multivariate context (about y v y-hat, R^2, residuals, etc. whatever you're thinking is a good for them to understand/consider).
Do we/Can we add some model assumption checks?
Continuing...
mode
is not significant (without being used as an interaction); so, these two questions should be re-oriented around/towards emphasizing that...
Can we add a small little segment that discusses that an observation is a row which can obviously be multivariate and have many measurements?
... this is something that could be introduced straightaway with data frames (but I don't think I thought to do this); but, I don't think it's necessarily that relevant at that point in time; whereas, it becomes relevant in linear regression; and/but, I think it's okay if we wait until multiple linear regression as opposed to simple linear regression to introduce this idea...
Adding in the old (unfinished) version of homework 8 for the sake of opening a new PR. Will be adding all the files when updates are made to them.