This is a curriculum of open-source data science exercises, intended to take a student from zero coding experience to basic data science literacy. These exercises are heavily inspired by the (discontinued) Data Challenge Lab at Stanford University and rely on the Tidyverse.
Please see our JOSE paper for more info.
File > Open Project...
).exercises_sequenced/
folder at your own pace to learn data science skills.challenges/
folder to put your new skills to use.Suggested order: The exercises filenames start with a numerical dXY
prefix to denote
their suggested day-order. This is provided to
interleave
topics and provide about an hour of work per day. I recommend working 5 days a
week on the exercises and taking weekends off!
Data Science is a powerful toolkit to extract usable insights from data. In this class, you will learn tools and gain understanding. You will use software tools to liberate data from published images and tables, wrangle messy datasets into machine learning (ML)-ready form, fit and interpret ML models, and visualize to extract meaning. You will also speak the language of uncertainty---statistics---to avoid getting fooled by models. You will criticize published findings and ask what is, and what is not, in the data. Assignments will include regular practice exercises, progressively puzzling real-data challenges, and a final project of your choice where you obtain, wrangle, and understand a dataset.
I welcome suggestions and contributions! If you want to contribute, please see Contributing.