Replace the inflammation dataset with a dataset in long format that has column names

swcarpentry / r-novice-inflammation

Programming with R

http://swcarpentry.github.io/r-novice-inflammation/

Other

163 stars 395 forks source link

Replace the inflammation dataset with a dataset in long format that has column names #612

Open dmcglinn opened 10 months ago

dmcglinn commented 10 months ago

How could the content be improved?

Hey I love the lesson and use it every time I teach R; however, from a pedagogical perspective the inflammation dataset is just strangely structured (wide format without column names) and so the code used to read it in (e.g., read.csv(..., header = FALSE), and then to work with it graphically (plot(1:length(...), ) doesn't set the students up to be able to generalize their knowledge to a more standard data structure that they also want to read in and graph. Just wanted to pass that along (as I correct my students's coding errors :P) if there is every a rehashing of this content - switch to a more commonly encountered data structure. Thanks for all the hard work.

Which part of the content does your suggestion apply to?

Reading and plotting the data.

isaac-jennings commented 7 months ago

Hi @dmcglinn, thank you for the contribution.

I certainly see your point of view, and I am currently in two minds. One being in agreement with what you have described in your issue. On the other hand, I feel as though the dataset may have been intentionally implemented this way, as the context of the datasets is that they are derived from treatment or device measurements. Possibly replicating output for scientific devices/hardware. Having said that, tidy data is almost certainly the best structure from a teaching/delivery/learning perspective.

Labeling this as discussion for @Bisaloo and @HaoZeke; any thoughts?

Bisaloo commented 7 months ago

I completely agree we're likely to have to update the dataset at some point. The lesson & the whole ecosystem have evolved a lot since the dataset was first picked and it makes sense that it's no longer the best fit.

However, it's an important change with many implications. Among the least obvious implications, we may have to coordinate with the python-novice-inflammation lesson to decide if we want to stay in sync and pull the plug in a coordinated manner.

I would suggest that we start a meta-issue with all the feedback and request about the dataset to gather requirements about a potential new dataset and then get in touch with the curriculum advisory committee and the rest of the community. I may be that the recommendation is to fork the lesson for example.