Closed lucieyang1 closed 1 year ago
I currently agree that "old 15-20" should likely be canibalized into the tutorial.
I uploaded my edits to HW4. I haven't yet made the changes to the student-facing file, but will do so once the tester changes are finalized.
For Q15, I currently have it asking about constructing an 80% confidence interval, but I was thinking of changing it to refer to the 95% confidence interval since that was what was calculated earlier in Q10, although I don't think it's that important. It seems like the original sample is actually pretty representative of the population, and that both the 80% and 95% confidence intervals from the original sample do capture the true population median.
I uploaded my edits to the tutorial now. I moved around the order a bit and now the content should take around 80 minutes, leaving 20 minutes for working on the tutorial assignment.
For one of the code demos, I put the import statements and made a helper function on a slide to be skipped so that the relevant code and plot on the next slide can fit, however, I'm not sure if it will still run when presenting the slides? Or if there's any better way to do this.
np.quantile
does what it claims.
Can you see if you can get a student facing version of this file working?
left_count
of the roaddata_sample
. I changed it back so that it will test it, let me know if there was a reason for the change or if I should change it back.
left_count = roaddata.sample(n=100)['road_side'].value_counts()['left']
oddly gives a different number each time, while doing road_sample = roaddata.sample(n=100)
and roaddata_sample['road_side'].value_counts()['left']
separately seems to give 30, even with the same seed. for one_sample in all_samples
is needed before the for j in range(number_of_bootstrap_samples_per_sample)
loop, though I'm not sure why it doesn't work without it...Great catches -- wonderful care and attention to detail -- really good awareness.
assert left_count != 30
was wrong and it should be assert left_count == 30
like you fixed it to be. Sorry! (I must have wanted to see the test fail and then forgot to fix it back...)
left_count = roaddata.sample(n=100)['road_side'].value_counts()['left']
versus road_sample = roaddata.sample(n=100); roaddata_sample['road_side'].value_counts()['left']
is only because np.random.seed(130)
only needs to be called once to make the second version always be the same; whereas, the version I used needs to always set the random seed as np.random.seed(130); left_count = roaddata.sample(n=100)['road_side'].value_counts()['left']
.np.random.choice(population['PAID'], sample_size_n, replace=False)
(with no assignment) as the first line in the first for loop, then the random numbers will be getting used the same and will now match... do you see what I mean?
~I'm still working on the tutorial, but the homework is done.~
For the tutorial, I'm not too sure what material to cover. Last year's tutorial seems to be mostly focused on the distinctions with hypothesis testing and type I/II errors, but Week 4 will now come before those concepts are covered. I was thinking of using some of the examples from last year's lectures on how well samples approximate populations and getting tighter confidence intervals. Some guidance here would be appreciated!
Edit: I uploaded a draft of the tutorial, but I have around 20-30 minutes I'd like to fill with some practice or discussion. I'm not sure what to cover here.