pyOpenSci / lessons

A repo containing lessons used in pyOpenSci training.
https://www.pyopensci.org/lessons
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

clean code workshop: finish activity 3 #25

Open lwasser opened 1 day ago

lwasser commented 1 day ago

Activity three in the first workshop is about adding checks to your code using try/except blocks. Originally, i planned to have the learners try to trouble shoot the issue in the code and fix it.

however, given the complexities associated with functions and the. added complexity of these blockss, I think it would be better to turn this into a notebook-based activity. This removes the debugging compobent out of the activity and focused on the actual implementation of try/ except.

So the activity will be changed to

  1. provide a small data frame with some data that has the issue that is in the data that is messy - for example - a date has a word rather than an int. They can then try to fix the function to process that DataFrame successfully by adding the check!

it will isolate the learning effectively and be much less overwhelming. I can then create several activities that have slightly increasing complexity where they fix a function so it handles the messy data issue better.

If we get to this activity in the workshop, i'm confident this will be a much better approach!

Data "features" that learners will encounter:

Parse the package name from the title

More features to come.

sneakers-the-rat commented 1 day ago

Havent taken a look yet but re: dates can we encourage them to use iso 8601-like dates?

yyyy-mm-dd yyyy-mm-ddTHH-MM-SS

And so on?

Then one of the bug cases can be confusing the month and day in mm-dd-yyyy vs dd-mm-yyyy dates as a motivator for why one should just get used to always doing isoformat

sneakers-the-rat commented 1 day ago

As a framing for exception handling, maybe giving some scaffolding for why and when you would want to do it - and importantly why you might not want to?

Like a common newbie pattern is "why dont I just wrap everything in a try/except block because then no errors!" So a motivating counterexample might be a set of functions that pass errors silently and showing how it gets tricky to know where the problem is fast.

A way I have taught this in the past is like "what about the input to a function needs to be true in order for it to do what you want it to do?" And I see you already have some exercises for this, eg. The "what if the files aren't there" example. Another easy one is input type, like the extremely common case of accepting a path as a Path or a str and casting it as soon as its received.

I feel like related to that is "when do you want to explicity raise errors" and an easy example for that is an if/elif block that terminates with an else that raises for unknown input.

As far as practical applications go, I think a very common need for newbie scientific programmers is "run an analysis over all these datasets," so the example of moving iteration out of a "main" function, catching and gathering errors from running a single loop, and then presenting all errors at the end rather than crashing the run

lwasser commented 22 hours ago

@sneakers-the-rat I love these ideas. 🚀 it is SO fun to have other people to brainstorm with on this!!

There are a few lessons to consider when incorporating these ideas. My goal in the organization is something like this

  1. We have applied activities. Activity 3 is where the skills are applied!
  2. We have additional supporting content that they can read/review that we link from in Activity 3.

Material that supports Activity 3 is:

The idea is that it's a flipped classroom thing where people can read outside of the workshop if they wish and refer to the content later.

So YES! Suggestion: Let's work together on improving that function checks lessons, which should provide that why element. I'll work on the activity today first, just so it's done. We can talk about date-times as a side note in the lessons! YES! but this lesson will be about 1-1.5 hours as taught, so it's so short and should focus primarily on the why and how of adding checks and maybe the common traps that you mention above!

After pulling this all together I suspect we could run an entire workshop in this specific topic!

Lemme know what you think!! And please have a look at that function checks lesson!! ✨

lwasser commented 21 hours ago

Then one of the bug cases can be confusing the month and day in mm-dd-yyyy vs dd-mm-yyyy dates as a motivator for why one should just get used to always doing isoformat

@sneakers-the-rat i'm working on this now - i'll ping you on the pr so you can have a look. They are taking a date from cross-ref data (JOSS publication data) and formatting it. it might look something like this but i'm sugging that they use pd.datetime to convert! i'll ping ya today when i have a better edited draft!