Proposal: move python functions after the loops and conditionals lesson

Hi everybody,

@gvwilson pointed out to me during the instructor training that the examples in the python functions lesson are not very compelling, so I have been trying come up with some more attractive functions (resembling more an "authentic task"). The issue, however, is that currently we teach loops and conditionals after functions. https://github.com/swcarpentry/bc/tree/master/novice/python Without loops and conditionals, I have been unable to write a function that does anything interesting and is easy to integrate with the rest of the lesson. A way to bypass this would be to move the function lesson to later. This is not a trivial change because the patient data example, which is spread across lessons 01 to 04, would need to be changed heavily (and not in a good way).

So I would like know what you think of the following alternative: define the analyze() function in a file that can be imported and used as a black box during the loop and conditional lessons, and explain the function definition later. The logic behind this approach is that we have already introduced modules and how to use functions (in lesson 01 numpy is loaded and we call mean() and loadtxt()).

Here is how the new lesson order would look like, and a full list of the changes it'd require:

Create a file containing the function analyze() (as written at the start of lesson 1, "Analysing multiple data sets"), to be imported in the (new) lesson 2.
Lesson 1 (currently 01) ["Analyzing Patient Data"]
- Update last subsection "Next Steps" text to link to next lesson.
Lesson 2 (currently 03) ["Analysing multiple data sets"]
- Remove box with function definition. Add text explaining that the plotting code used in lesson 1 "Wrapping up" section has been used to create a function that can be imported similarly to numpy calls in lesson 1. Illustrate by generating the graphs.
- Subsection "For Loops" - Challenges 1,2,3: change wording from "define a function" to "write a code". Remove comment about the docstring.
- Subsection "Lists" - Challenge 1: change wording from "define a function" to "write a code".
- Subsection "Processing Multiple Files" - Challenge 1:change wording from "define a function" to "write a code/loop".
- Subsection "Next Steps": update text to link to next lesson.
Lesson 3 (currently 04) ["Making Choices"]
- Update introductory text (remove reference to functions).
- Subsection "Conditionals" - Challenge 2: change wording from "define a function" to "write a code".
- Update "Next Steps" text to link to next lesson: defining functions.
Lesson 4 (currently 02) ["Defining functions"]
- Update introductory text.
- Now more compelling examples, using loops and conditionals, can be used in subsections "Defining a function", "Debugging a function", "Composing Functions", and possibly "The Call Stack". (But there's no need to modify them immediately.)
- Subsection "Testing and Documenting" - Challenge 1: optionally, suggest comparing this function with the imported function. Change text "in the previous lesson" to "lesson 1".
- Update "Next Steps" text.
Lesson 5 (currently 05) ["Defensive Programming"] - No changes required
Lesson 6 (currently 06) ["Command-Line Programs"] - No changes required

Most of the changes are superficial so this change does not involve much work, but I don't know whether teaching this with the function hidden in a file would be a good idea. What do you think?

On Thu, Sep 25, 2014 at 08:59:25AM -0700, leogargu wrote:

The issue, however, is that currently we teach loops and conditionals after functions.

Previous discussion of lesson order in #256, but that's about loops vs. scripts.

@leogargu I thought that the reason that we first teach functions is that it is what we want students learn (specially as one of the good practices) so we can use it in our examples for loops and conditionals.

What about

Lesson 1 (currently 01) ["Analyzing Patient Data"]

Keep as it is.
Lesson 2 (currently 02) ["Wrap Analyzing of Patient Data"]

Be short just to introduce basic function syntax so it can be use in other lessons.
Lesson 3 (currently 03) ["Analyzing Multiple Data Sets"]

Keep as it is.
Lesson 4 (currently 04) ["Making Choices"]

Keep as it is.
Lesson 5 (new) ["More About Functions"]

Rewrite the last example with "extra" function features like default value and key word arguments.
Lesson 6 (currently 05) ["Defensive Programming"]

Keep as it is.
Lesson 7 (currently 06) ["Command-Line Programs"]

Keep as it is.

Thanks for getting this discussion started, @leogargu. Overall I like the idea of introducing loops and conditional statements before functions because it is more natural, i.e. if you haven't written code that required loops or conditionals, you likely haven't written enough code to realize that functions would make your life easier.

However, I am -1 on importing the function analyze as a blackbox for use in the earlier lessons. I see this as putting the cart before the horse. For many novices at our bootcamp, these examples are the most code they have ever written. While we want them to use the best practices from the beginning before they develop any bad habits, I also think it is necessary to show them the problem for which we are giving them the solution.

One of the reasons I like the novice lessons so much is that it attempts to mirror a real data analysis. In the first lesson we interactively explore one of our datasets and write some code that we consider useful:

import numpy as np
from matplotlib import pyplot as plt

data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')

plt.figure(figsize=(10.0, 3.0))

plt.subplot(1, 3, 1)
plt.ylabel('average')
plt.plot(data.mean(0))

plt.subplot(1, 3, 2)
plt.ylabel('max')
plt.plot(data.max(0))

plt.subplot(1, 3, 3)
plt.ylabel('min')
plt.plot(data.min(0))

plt.tight_layout()
plt.show()

Great. But then we realize that we have 11 more files to analyze. How should we proceed? Well the first thing to come to mind is simply to copy-paste the code and replace the filename each time. The instructor could explain that this is not ideal because it is tedious, error-prone, and will make it more difficult to update the code in the future since we have 12 versions of it. Instead, we would start a lesson on for loops and end with the following:

import glob
filenames = glob.glob('*.csv')
for f in filenames:
    print f
    data = np.loadtxt(fname=f, delimiter=',')

    plt.figure(figsize=(10.0, 3.0))

    plt.subplot(1, 3, 1)
    plt.ylabel('average')
    plt.plot(data.mean(0))

    plt.subplot(1, 3, 2)
    plt.ylabel('max')
    plt.plot(data.max(0))

    plt.subplot(1, 3, 3)
    plt.ylabel('min')
    plt.plot(data.min(0))

    plt.tight_layout()
    plt.show()

This works, but it always runs on all 12 samples. Also, it is a lot code to read and figure out what it is doing. This is the motivation for writing it as a function. This will allow us to run it on one file at a time when we are testing new features and then easily put it in a loop to run over many files. Also, it will be much easier to follow our code when it is written:

for f in filenames:
    print f
    analyze(f)

I also like @r-gaia-cs's idea about introducing simple functions and then saving the discussion of default arguments until after learning conditional statements. So I would advocate something like:

interactively exploring data -> loops -> simple functions -> conditional statements -> advanced functions

swcarpentry / DEPRECATED-bc

Proposal: move python functions after the loops and conditionals lesson #743