Closed abostroem closed 8 years ago
Looks good to me - @ethanwhite, can you have a quick look, and if you approve, we'll merge.
Apologies for the delay. Lots of travel, no time to keep up with email. I should be able to take a look at this by the end of the week.
Thanks for this @abostroem! I think there's two bigger questions and some details that are worth discussing. I'll add the bigger questions now and some more detailed feedback later.
Minor suggestions on the Testing notebook (which looks awesome overall!):
...oops, time to go pick up my daughter from pre-school. More later.
@ethanwhite I'll make those detailed changes later this week. To address your 2 larger questions.
I chose Numpy over Pandas for a few reasons:
I've found a lot of overhead switching between notebooks and I tend to favor lessons that build on each other, so my preference is for a large notebook for each section. To me the biggest downside is that it is harder to maintain. I am happy to discuss and defer to the group at large. When I was preparing I copies and pasted the lessons together (and rearranged them a bit if I remember correctly). I submitted them in the form I taught them.
In the testing notebook:
zip
, which I find can be fairly confusing for students the first time they see it. With those things address I think the testing notebook is ready to go and would be +1 for merging it via a separate pull request.
I'm fine with moving to Numpy, but I'd prefer to use Numpy structured arrays rather than using multi-value assignment to split the columns up into separate variables (the later doesn't really work with large numbers of columns).
I personally prefer the notebook the bite-sized chunks for notebooks because it makes it easier to use these resources externally (as I'm currently doing in my university courses) if we don't aggregate material too much.
Ok, I haven't gotten to the Testing materials (and the 03-qa.ipynb) but I split the one intro into multiple sections which kind of parallel what you originally had (except using Numpy). Here are the details of what I've changed: Ported Everything from Pandas to Numpy Intro Add more on types and loops to the intro section
Plotting This is a new section and deals explicitly with visualizing data
Modularization
I broke my sections into: 01-intro-python 02-plotting 03-modularization
Given that we have discussing creating parallel numpy and pandas material and that with my new files this in no longer a direct update of the previous files, how should we proceed with this?
Given the overall response to your email on discuss, and the impending proposal to split up bc into separate repos and allow folks to maintain parallel versions of lessons in separate repos, I'd recommend that we go with two repos, one for numpy and one for pandas. This will also let the numpy repo move towards a set of data and analyses that makes more sense to other groups if you wanted to do so (in the past folks who wanted to use numpy instead of pandas also didn't like the inclusion of regression; again, different scientific cultures/needs).
If we go this route then if/when bc
gets split up we'd need to make two repos based on the current intermediate Python material and your changes would go into the numpy one.
I'd recommend splitting the testing material out into a separate PR. This could easily go in now (even without the additional changes I've suggested) and then get improved upon from there.
+1 to having two versions of this around. I presented the NumPy version last week, and it was generally well received (I had a slightly better experience than with the novice materials, in any case, as to me the mosquito data is easier to explain than the inflammation data).
For the LBL-WISE bootcamp I modified and reformatted the intermediate python material.