Closed sfirke closed 7 years ago
possibly have a look at https://github.com/rudeboybert/fivethirtyeight (many datasets) via @rudeboybert and @ismayc
Content area/topic that is as universally accessible and intuitive as possible, should not require explaining.
@sfirke For me, at least, sports data fits the bill here, but I'm pretty sure that's not universal. (Then again, plenty of people don't know the difference between SOHC and DOHC, and that doesn't stop them from using mtcars
.)
Of course there's the excellent gapminder data package from @jennybc , the README points to some good examples of teaching tidyverse
from her course.
This exact ideas was an unconference topic two years ago. Mine Centinkaya-Rundel, Eli Bessert and one other person had this goal of creating a collection of interesting datasets (spanning various domains and appropriateness for teaching topic x). It even had a website (I can try to dig up any work they did, including past links).
The titanic dataset (https://www.kaggle.com/c/titanic/data) is fun and relatable for a basic data manipulation and data viz classes. It's quite relatable to a broad audience (iris
on the other part is not particularly exciting, even for biologists. It helps that it is a nice even dataset with 50 rows/species and comes baked into base R).
This is the best archive I know of and use quite regularly → https://github.com/caesar0301/awesome-public-datasets
The quickest way to browse thru the fivethirtyeight
package data sets is via the "Data Sets" section of the package vignette. If you have thoughts on ways to improve, contact me, @ismayc , or @jchunn. Thanks!
Between the wealth of examples (thanks all!) and this having been tackled at a past unconference, I'm closing this ✅
(shoutout to https://twitter.com/tamaramunzner/status/743857012476280833)
Does anyone have a go-to versatile sample data set, or favorite package of data sets? I don't find that mtcars or any of the other built-ins like ToothGrowth or CO2 meet my needs. I know nycflights13::flights and ggplot2::diamonds but would like a data.frame with:
Nice if it could also be used for machine learning (that's less important to me but would increase general usefulness). I'd also consider using it for unit tests in packages (I use mtcars in places but it's limited).
Does such an all-purpose demo data set exist that I'm missing?