swcarpentry / python-novice-inflammation

Programming with Python
http://swcarpentry.github.io/python-novice-inflammation/
Other
300 stars 775 forks source link

What changes should we make for our 5.4 release? #127

Closed gvwilson closed 8 years ago

gvwilson commented 9 years ago

Please use this issue for discussion of changes we should make by mid-August 2015.

embray commented 9 years ago

As mentioned on the blog, I would like to see Python 3 on the table.

jgalgarra commented 9 years ago

I support the motion, teaching Python 2 does not add any advantage these days

twitwi commented 9 years ago

python++

jrherr commented 9 years ago

I'm still trying to upgrade myself to Python 3 (not there yet fully...), but I also support this motion.

chendaniely commented 9 years ago

Titus uses the argparse module in his reproducible computational analysis lesson [1]. Should we use that for our python lesson? or is that adding too much to an already bloated lesson?

You can see the code where he uses argparse at the 45:05 minute mark (the link below directly links there).

[1] https://youtu.be/SRItP6PSu4U?t=42m5s

gvwilson commented 9 years ago

We can only add things if we identify something to take out...

abostroem commented 9 years ago

@chendaniely This was already discussed in #45

katyhuff commented 9 years ago

I'd like to suggest that 5.4 could use a gentler introduction to python. If we're being really honest, I never end up using this version of the python lessons because I think it has a fundamental flaw. Namely : this lesson starts with numpy rather than how to start up python or what it means to assign a variable. There's so much that has to be understood before I think students are ready to "import numpy".

When the audience are novice, I always end up using an updated version of the old THW material or @jkitzes materials. Both of those choose to introduce concepts like variables before introducing the concept of libraries, which I think is extremely important.

I recognize a lot of thought went into these 5.3 lessons, so I'm sure there are reasons for all of the decisions. I also know that the skills I mention are called out as prerequisites in the intro, but I think there should at least be a 00 lesson that demystifies variables, lists, and arrays before importing numpy and loading an array into memory... just to make sure everyone in the room is on the same starting block.

teamneem commented 9 years ago

+1 to a gentler introduction. I use the old version, too. I feel like it loses students very early on because they don't have a context to build upon. I usually end up doing the old version and then doing an abridged version of the inflammation stuff

gvwilson commented 9 years ago

Do you think the Data Carpentry introduction to Python (https://github.com/datacarpentry/python-ecology) is a better fit for complete novices? Should we put "Python for people who've never programmed before" effort into that lesson, and modify this one to be "Python for people who've seen for loops and conditionals, but not in Python"?

jrherr commented 9 years ago

+1 to having a "Python for people who've never programmed before" in the Data Carpentry repo, and having this Software Carpentry lesson be a "Python for people who've seen loops and conditionals, but not in Python".

From my teaching experience for Software Carpentry there is usually a mix of people who have never programmed before with people who are familiar with programming syntax but have never used Python. I'm still struggling in thinking of how many types of lesson material to keep around (beginners, proficient with code but new to Python, more complex structures...).

iglpdc commented 9 years ago

I use the current version of the lesson. but include a 15-min intro of my own. There are two things I found people get confused about. One is the word zoo: Python, python, IPython, 2 and 3, and even Anaconda (why do I have to install Anaconda to run python?) To clarify this, I start by talking about Python (the language) which comes in two dialects (2 and 3). I mention there are programs to interpret Python and I open and run the same statement (say, print "Hello world") using the python, ipython and the notebook. I try to stress the parallelism with the Unix shell and how to enter and exit each program. (In particular, the idea of the web server taking up your terminal for the notebook is hard to understand for many.)

I stress a couple of things here: 1) the language Python is exactly the same, so these programs just interpret the language, and 2) source code is for people, not machines.

The second thing confusing people is the structure of the language: what's the difference between a name and an object (even if they don't know these terms), why calling a method with the dot syntax and things like that. So I do a short intro to variables (the weights section in the lesson) before importing numpy and explaning a bit what happens (an object is created and a name point to that object). A couple of times I explained what makes up Python reducing it to keywords (and operators), names and objects. I really like this part, but I had mixed feedback about that. Some people (maybe those more used to abstract math) find it helpful, while others complain about being too theoretical.

In any case, it's 15 mins at most. Every time it has been worthy because I save time from the inevitable questions and problems that sooner or later come up. For example, I could say now that import numpy just declares a new name and brings a huge piece of code into my machine's memory.

I don't think that Data Carpentry start is better, because as novice I wouldn't find interesting going through the data structures, functions, and all that without having a clear user case. For example, after finishing our first analysis of the inflammation data, it's clear that putting all the code into a function will be much better than changing the script to "reuse" it with other file, which is what people actually do.

So my proposal would be:

The last point should be done carefully, and probably losing some rigor, to avoid mentioning Python's object-oriented structure too much (or at all). I think at this point lying a bit is OK.

gvwilson commented 9 years ago

Is this the sort of split people have in mind?

Never Programmed Before Some Previous Programming
importing libraries
(quick review)
calling functions
loading CSV data
assigning to variables
tabular operations
plotting
← writing functions →
← documentation →
loops over lists
conditionals
line-oriented file I/O
command-line arguments
building Unix filters
building libraries
defensive programming
debugging
unit testing
test-driven development
jrherr commented 9 years ago

I think this looks excellent!

This isn't even a suggestion, but should "plotting" also be something that is covered in the 'some previous programming' curriculum, like "writing functions" and "documentation"? Maybe it's just my (and others) confusion with 'matplotlib', but I think plotting is an important topic that often garners questions from Software Carpentry participants.

I really like this table.

katyhuff commented 9 years ago

I do like the table in general. However, I think the main point that I, personally, was trying to get across, is that importing libraries should neither be at the top nor on the left of the table. Importing libraries is magic until you at least understand functions and variables (because what's a library if not a variable name with which you access to functions and objects.)

The same can be said for tabular operations and plotting. Yes, they may have used a spreadsheet, but that's very different than managing a single variable that represents your array (first of all you need to know variables before it makes sense to talk about tables...)

Also, what are these unix things doing in the table if this is about the python lesson?

Anyway, I would suggest some edits to the table that capture those ideas as well as the comments from @iglpdc about the need to brief folks on the word zoo and variables (dot syntax, names and objects) before importing numpy. The following table has, on the left, the things that should happen first, in 15 minutes, before the current lesson (the column on the right) commences. (reminder: My concern was that, as it currently stands, the lesson begins with the column on the right, importing numpy, loading CSV files, and working with arrays... which I think presents programming as full of magic rather than clear and comprised of building blocks. )

Never Programmed Before Some Previous Programming
running python/ipython/notebooks
(quick review)
calling functions
assigning variables
loops over lists
← writing functions →
← documentation →
importing libraries
loading CSV data
tabular operations
plotting
conditionals
line-oriented file I/O
command-line arguments
building Unix filters
building libraries
defensive programming
debugging
unit testing
test-driven development
abostroem commented 9 years ago

I continue to maintain that our target audience should be people who have programmed before (or dabbled in programming) but not necessarily in Python. I like @katyhuff edits. A quick review/ let me tell you the syntax is important but we can quickly move onto the tools that a scientist who is programming in some capacity will want for everyday work.

abostroem commented 9 years ago

I think the list on the RHS is too long for a typical workshop. I'm not sure if this is in any order. I consider optional (building Unix filters, building libraries, unit testing, and test-driven development).

katyhuff commented 9 years ago

Perhaps this could work out if the last six items on this list were their own "defensive programming lesson." I, for example, like to teach exceptions, debugging, and testing as it's own thing on the afternoon of the second day (rather than sql). With a good group, by the end of the day you can get them all pushing up a piece of python with unit tests and having it run continuous integration on travis. (http://bids.github.io/2015-06-04-berkeley/testing/)

abostroem commented 9 years ago

Thanks @katyhuff , you made me think about how I've filled the end of day 2 in the most recent workshops I've taught. You're right - I haven't spent it on traditional SWC curriculum. I've done two code reviews, extra data visualization, and domain specific lessons.

I would love to hear other's opinions about how optional you consider these lessons: building Unix filters, building libraries, unit testing, and test-driven development

Are we reversing our decision on teaching testing? Will our target audience write tests beyond assert statements? Are code reviews an acceptable replacement?

Do people want to know how to build libraries before really digging into plotting (do you do both?)?

If you don't teach SQL, how do you fill the end of your second day?

katyhuff commented 9 years ago

I know that for scipy, matt is giving an introductory overview of the ecosystem of scientific python libraries... but that's very specific to the scipy conference.

On Wed, Jun 24, 2015 at 10:18 PM, Azalee Bostroem notifications@github.com wrote:

Thanks @katyhuff https://github.com/katyhuff , you made me think about how I've filled the end of day 2 in the most recent workshops I've taught. You're right - I haven't spent it on traditional SWC curriculum. I've done two code reviews, extra data visualization, and domain specific lessons.

I would love to hear other's opinions about how optional you consider these lessons: building Unix filters, building libraries, unit testing, and test-driven development

Are we reversing our decision on teaching testing? Will our target audience write tests beyond assert statements? Are code reviews an acceptable replacement?

Do people want to know how to build libraries before really digging into plotting (do you do both?)?

If you don't teach SQL, how do you fill the end of your second day?

— Reply to this email directly or view it on GitHub https://github.com/swcarpentry/python-novice-inflammation/issues/127#issuecomment-115108218 .

http://katyhuff.github.com

chendaniely commented 9 years ago

would it be out of scope to link to the PEP8 rules [1], and have students know a style guide for python exists? I personally find reading style guides as a good intro/refresher to a language.

[1] https://www.python.org/dev/peps/pep-0008/

iglpdc commented 9 years ago

would it be out of scope to link to the PEP8 rules [1], and have students know a style guide for python exists? I personally find reading style guides as a good intro/refresher to a language.

+1

rgaiacs commented 9 years ago

Sorry for my late comment.

this lesson starts with numpy rather than how to start up python or what it means to assign a variable. There's so much that has to be understood before I think students are ready to "import numpy".

This is one of the things that I like in this lesson, i.e. not explain each Python data type before show a nice application. But I agree that we lost people that never programmed before doing it.

@katyhuff What about start the lesson with

import numpy
import matplotlib.pyplot
matplotlib.pyplot.show(matplotlib.pyplot.imshow(numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')))

and after it explain the basic concepts like variables?

Never Programmed Before vs Some Previous Programming

For "Never Programmed Before"

For "Some Previous Programming"

I would love to hear other's opinions about how optional you consider these lessons: building Unix filters, building libraries, unit testing, and test-driven development

This makes sense for a non-novice group of students of the same lab. For novice learners from different labs I think that digging functions and data visualization makes more sense.

katyhuff commented 9 years ago

I hear you @r-gaia-cs . While I am perhaps more of a reductionist in my teaching, I do agree that it is certainly sometimes nice to show people where you're going before you start off toward the destination. I think that's a pedagogical choice, but can succeed with careful execution. I'm open to that as long as it follows the general (spiral learning? https://en.wikipedia.org/wiki/Spiral_approach) path:

1) Simple, powerful illustration of the lesson goals 2) Back to basic concepts 3) From the basics, build up the pieces necessary for the powerful illustration 4) Finalize the lesson with a review of what you were able to do and why

On Thu, Jun 25, 2015 at 7:36 PM, Raniere Silva notifications@github.com wrote:

Sorry for my late comment.

this lesson starts with numpy rather than how to start up python or what it means to assign a variable. There's so much that has to be understood before I think students are ready to "import numpy".

This is one of the things that I like in this lesson, i.e. not explain each Python data type before show a nice application. But I agree that we lost people that never programmed before doing it.

@katyhuff https://github.com/katyhuff What about start the lesson with

import numpy import matplotlib.pyplot matplotlib.pyplot.show(matplotlib.pyplot.imshow(numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')))

and after it explain the basic concepts like variables? Never Programmed Before vs Some Previous Programming

For "Never Programmed Before"

  • Demystify the zoo jargon
  • Motivation with plot (will need to import libraries)
  • Intro do variables
  • Functions vs Methors (or Why there is a dot here?)
  • Data structures
  • More on loading CSV data
  • Tabular operations
  • More on plotting
  • Writing functions
  • Documentation

For "Some Previous Programming"

  • "Quick review" (or Python Cookbook)
  • Loops
  • Conditionals
  • File I/O
  • Command-line arguments
  • Building libraries
  • Defensive programming
  • Debugging (or PDB)
  • Unit testing
  • Test-driven development

    I would love to hear other's opinions about how optional you consider these lessons: building Unix filters, building libraries, unit testing, and test-driven development

This makes sense for a non-novice group of students of the same lab. For novice learners from different labs I think that digging functions and data visualization makes more sense.

— Reply to this email directly or view it on GitHub https://github.com/swcarpentry/python-novice-inflammation/issues/127#issuecomment-115470242 .

http://katyhuff.github.com

karinlag commented 9 years ago

First, lots of smart stuff in here. In my context (bioinformatics) the best "Simple, powerful illustration" is to teach how to write a small script which they can give different inputs to from the command line. My line of work is pretty analysis focused, and showing them how they can do the exact same analysis on different input files is always an aha-moment and an instant motivator.

theboocock commented 9 years ago

My lesson suggestions having recently instructed and observed a workshop in Dunedin New Zealand.

The R lessons hastily introduce functional programming in the form of applys. I believe the lessons should stick strictly to for loops as they can accomplish the same tasks. While I agree that the functional paradigm is powerful it takes some experience to truly understand and incorporate seamlessly into your day-to-day munging. Functional programming ideas definitely have value and could perhaps be offered as a intermediate course.

abought commented 9 years ago

Incidentally, I've started (in a very small way) to port some of our examples over to Python 3 for this lesson: #129 .

It's possible to write code that runs in both versions, though it does add a few weird looking statements that might be confusing to novices. Teaching one version doesn't preclude compatibility with both- is there a clear instructor preference here?

gvwilson commented 9 years ago

We should write native, idiomatic Python 3 - anything else will be a pain to explain.

abought commented 9 years ago

We should write native, idiomatic Python 3 - anything else will be a pain to explain.

So as part of the changeover, there'd be a clear policy change to drop support for python 2 in our lessons? I'm actually ok with this (I think Python 3 is saner by default), as long as there are no surprises.

abostroem commented 9 years ago

I'd like to propose that we start a Python 3 branch and develop the lessons there and then once everything is ready we can merge it. Changing the lessons piecewise to Python 3 will be a disaster for those teaching during the transition.

gvwilson commented 9 years ago

+100.

katyhuff commented 9 years ago

Fun fact! Thomas (@takluyver) and I have decided to do this exact thing for SciPy (a python 3 version of the novice inflammation lesson). Maybe we can pull request those changes at the end of next week?

In addition, as part of that tutorial, we'll be adding an intro notebook to ease people in before they get to the import-csv-data-with-numpy jump-off point.

willingc commented 9 years ago

@katyhuff @gvwilson and others, I will see many of you at SciPy next week. I would be happy to share the notebooks that San Diego Python and I have used with novice developers and Python 3. All of you are doing wonderful work :+1:

amandamiotto commented 9 years ago

Hey Guys

I was asked to include my feedback here. I know that for our classes, we had a mix of 'had programming experience' and 'no experience at all' people, so please see below for Python feedback.

In regards to the Python classes- the first lesson is way too in-depth in regards to the graphing. I understand that it is geared to Biologists and trying to show how relevant it is, but I would stop at where the 'import matplotlib.pyplot' starts and push plotting to the end or disclude it. It is jumping straight into the deep end. Also there's a few places in the first lesson that the library is imported as an alias but that segment isn't explained or mentioned in the lesson (Raniere has since mentioned that it was bought up by Daniel https://github.com/swcarpentry/python-novice-inflammation/issues/133. ). Its good how the first lesson explains variables, but I would do this first instead of importing a library first (maybe swap those two concepts in the first lesson as I understand that you would want to import a file before doing too much else).

Thanks for the opportunity to do these classes, most of it is excellent and we got a LOT of good feedback re the class lessons.

jgalgarra commented 8 years ago

I would like to help with the py3 migration. I do not know if there is some kind of planning or if contributors pick at random. Thanks for your advice.

rgaiacs commented 8 years ago

I would like to help with the py3 migration.

@jgalgarra Thanks.

I do not know if there is some kind of planning or if contributors pick at random.

https://github.com/swcarpentry/python-novice-inflammation/pull/142/files#r34859375 is a good candidate for first contribution. But you can pick at random. Any contribution are welcome.

gvwilson commented 8 years ago

Please also see this post from Byron Smith on his experiences trimming down the Python lesson.

rmflight commented 8 years ago

Why not put this into the context of doing the statistical calculations in goostats mentioned in the novice shell lesson? Variables and loops could be used to summarize the various columns of an inflammation like data set, and then a shell script could be used to process a bunch of files.

I think there is a huge opportunity being missed by not having a single data set that can be used across all of the lessons, so that one could spend one day introducing shell, python/R and git, and then a second day putting then all together to write code that actually does something semi-useful, including processing files, and write it using version control. I'm still trying to figure out if that is too ambitious for a two day workshop with complete novices.

gvwilson commented 8 years ago

Data Carpentry's lessons use a single data set throughout the two days.
There are a lot of advantages, but it also gives the instructors a lot less flexibility to mix and match lessons for particular audiences. We decided a couple of years ago that the latter outweighed the former, but perhaps it's time to re-evaluate.

TomKellyGenetics commented 8 years ago

Sorry I'm a bit late to the table here, I've been away sick the past couple of weeks. In my experience teaching this material (in Australia and New Zealand) we usually deal with near complete beginners. Our typical attendees are young biology researchers with almost no programming experience. I would not recommend assuming any prior programming knowledge (in any language). This proposed split may be why the material is often a bit too advanced for pure beginners. Although ideally I would like the time to introduce most of the core programming concepts like functions, loops and conditional, I would not assume them with this audience. I suppose one bonus of the split would be better supporting these needs in a complementary complete beginners course. The main thing is to pay attention to the needs of your attendees as we seem to get vast ranges in prior ability.

uiuc-cse commented 8 years ago

The gold standard for what I'd like to see is Rosalind. Basically, they have a set of bioinformatics lessons that are organized in a tree-like fashion so for any arbitrary topic you know exactly what dependencies you have to satisfy. That would require pivoting to much shorter segments than are now in use.

The tricky thing is to introduce another axis (domain dependency).

embray commented 8 years ago

Also late to the table--(other than my early :+1: for Python 3):

First of all I agree with @katyhuff on the need for a softer introduction to getting started with Python--including starting the Python interpreter and how to use the Python interpreter (both interactively and to run scripts), as well as introduction to Jupyter (née IPython) and why some of us like it, and introduction to the Python language basics.

I do also agree with others who suggest that this can be an optional module, or one that's easy to split up into smaller chunks, some of which can be omitted when teaching a more intermediate audience (i.e. no programming experience vs. some programming experience). I think a lot of this introductory section can be dropped when talking to people with some experience, though it's still good if they've never used Python to discuss basic usage of the Python interpreter.


Now, the other point I wanted to raise is this: I would like to see a somewhat disconnected collection of stand-alone lessons on each of the sub-topics we want to teach (eg. as listed here and here. By keeping them separate and self-contained, it will be easier to mix and match them for the audience we are teaching to, as well as to fit time constraints, etc.

I do, however, really like the practical, hands-on, flowing format of the Inflammation lesson. I don't want anything I'm going to say to be taken as a criticism of that lesson plan or the people who have put a great deal of work into it. However, most of the SWC workshops I've taught recently have been predominately target at non-life sciences people. And while I think scientists are perfectly capable of wrapping their brains around and appreciating other sciences, I do think it makes it harder to make connections to and see the relevance to their own work. So I find myself again and again coming up with my own Python lesson--but it's difficult to integrate my lesson into the rest of the SWC lesson plan, since the Inflammation lesson is the Python lesson. There's not much I can do to pick and choose from the existing Python lesson plan--I have to replace it entirely.

I think having a more disconnected set of sub-topic lessons will make it easier to build new, domain-specific lessons around them. So a lesson like the Inflammation lesson can be used to introduce all the topics we want to each, but we would then branch off from main, domain-specific lesson into the various sub-topics.

Obviously, in order to have more domain-specific lessons people who are experts in those domains will have to step up to work on them, and that's the hard part. But I think it will be easier to get those contributions if we:

1) Provide a set of lesson plan building blocks at which a primary, domain-specific lesson can be centered. 2) Make it easy to swap out which domain-specific lesson is used for a given workshop (which would include a possibility for a lesson plan that simply teaches Python, perhaps with no domain-specific orientation).

lesson-plan dot

embray commented 8 years ago

(Aside: Not sure how my scheme would work, however, with Greg's other question about publishing lesson content... Maybe like a choose your own adventure story?)

alistairwalsh commented 8 years ago

Hi All,

Being in the position of about to do a bootcamp in Canberra (Australia) very soon. The group that is organising the bootcamp have said they would prefer it be Python 3 that is taught. I've had a quick look at the @katyhuff and @takluyver presented at Scipy 2015 http://jiffyclub.github.io/2015-07-06-scipy/python/ and they look great.

Could they be a starting point for whatever changes need to be made? They solve my immediate problem of delivering a bootcamp in Python 3 and from there we can work on some of the excellent ideas that have been suggested in this thread that require more structural changes.

I especially like

Domain specific

alistairwalsh commented 8 years ago

Also, I like the idea of quickly getting something plotted and then backtracking to explain how we got there. I think a lot of people respond well to producing visual images and can understand how changes in the code correspond to changes in the image. It's also fairly universal to science to create an image to explain results.

gvwilson commented 8 years ago

We're hoping to merge the Py3 changes this weekend, so yes, you'll be able to use Py3 in Canberra. We'll then try to arrange an online discussion early in September about revising the lessons Cheers, Greg

dotsdl commented 8 years ago

Any news on when this discussion might happen? I'm interested in attending myself, but haven't heard anything more about it.

gvwilson commented 8 years ago

We're just about to publish the 5.3 lessons (should happen tomorrow or Monday) and then we'll convene this discussion.

dotsdl commented 8 years ago

@gvwilson awesome. I think there's a lot of great ideas in the current python lesson that need to be retained in the next iteration, so I want to participate. Might try to write about it if I get the time to help collect my thoughts.

alistairwalsh commented 8 years ago

Just used the 5.3 lessons to teach a workshop in Canberra. It worked but there are some problems.

The challenges seem to have gotten out of order, or have been left behind when a section was moved. On Monday I plan on trying to make basic changes ( i.e not make any changes to the content at this stage) to bring everything back in line. Grateful for any help.

Also, be aware - 'range( )' doesn't work the same in python3. for example - 'range(10)' does not return [1,2,3,4,5,6,7,8,9] it returns range(0, 10)

I didn't realise this until it was used in that form on the day. if used in an expression like matplotlib.pylab.plot(range(10)) it will produce the expected graph but by itself it doesn't produce the number line as it used to in 2.7.