luchem / KEMM30

GNU General Public License v3.0
3 stars 4 forks source link

Notebook3 observations (including comments on implemented fixes after end of 2023 course) #6

Open brinkdp opened 11 months ago

brinkdp commented 11 months ago

Here are three main observations from when the students were working with notebook3 in the 2023 teaching sessions. I have implemented fixes for the last two points in the updated notebook already, and am mainly describing the issues here for the sake of posterity.

I expected that the notebook would be solvable with 2x4 hours teacher led time + self-study outside of class. But in fact, a majority of the groups needed 3x4 hours teacher led time. Since this was the first time we used notebook3, I had booked an extra four hours session to be able to accommodate for this, and I am glad I did. For future planning, I recommend 3x4 hours for the current state pf this notebook .

Another observation is that, to my knowledge, no group attempted the bonus task about curve fitting. Since it clearly says that it is a bonus task, I think it can still be kept in the notebook. But since I did not talk to any student who had tried to solve the task, I got no feedback on the level of difficulty of the task and what potential errors that could be expected.

Notebook3 is based on downloading a public dataset consisting of 19 .csv files and bit-by-bit write a code that can automatically read, process, plot, and calculate key parameters from the data. My pedagogic idea was to make the students think about their working directory (wd) at all times, and I therefore suggested them to use os.chdir() to move between folders when needed. This turned out to be unnecessarily complicated for the intents and purposes of this course. In the updated notebook, I have implemented a os.path.join() method so that all manipulation of directories and their content is done by relative paths instead of manually controlling the current wd. To be fair, this is probably a more robust solution to the problem anyway.

An error that occured quite a bit in the class was that pd.read_csv can, as the name implies, only parse .csv formatted files. The public dataset we use for the notebook contains one .txt file with metadata, meaning that many people will encounter an error when they loop over the folder contents - as indexed by os.listdir() - and reach the point where the loop tries to load the .txt with pd.read_csv(). Most students deleted the .txt. file to circumvent the issue. But I also learned that if the code throws an error after os.chdir() has been called to change the wd to the data subdirectory, it creates a hidden directory called .ipynb_checkpoints. If this has happened, next time the loop is run, this directory is added to the list of directory contents by os.listdir() and leads to an error when sent to pd.read_csv.

The fix for this is simple, but since I never encountered the error myself I did not think to include it when I developed the notebook. The fix:

#Asumming that the wd is the dir in which the notebook is run from and that there is a subdir called ecoli_growth_data, then: 

my_wd = os.getcwd()
my_files=os.listdir(r'ecoli_growth_data')

for file in my_files:                     
    if file.endswith('.csv'): 
        path_to_file=os.path.join(my_wd, 'ecoli_growth_data', file)
        df = pd.read_csv(path_to_file)
        #additional code for calculations, plots, etc. goes here

Another solution is to use a try statement instead of if-.endswith('.csv'). But I don't think we covered try-except in the course, and I think that the self-explanatory name of the .endswith('.csv') function is preferable in this case.

mlund commented 1 week ago

Ping @amiwilliam00.