Lesson re-structuring and dataset improvements

tavareshugo / r-intro-tidyverse-gapminder

Data Carpentry: Introduction to R/tidyverse

Other

2 stars 3 forks source link

Separate "data manipulation" from "data cleaning" (related to difficulties mentioned in #5). For example, after plotting, the lesson order could be:
- data manipulation with dplyr (select, rename, mutate, arrange, filter, distinct)
- summarise and grouped operations (summarise, group_by, count)
- joins
- data cleaning, with write_csv() at the end (stringr, factor, and possibly separate and unite)
- data reshaping (pivot_*)
Possibly the plotting lesson could be broken in two. The last part "Customising graphs" could become a new lesson (maybe an optional appendix), and could be expanded to include more examples of custom themes (e.g. ggthemer) and also assembling plots with patchwork (now on CRAN). More ideas on #3.

For this to work, then it would help to revise the datasets:

create a small-ish and clean version of the data (i.e. no typos or funky things like that)
create a larger but messier version, which can be used in the cleaning lesson towards the end
distribute the data as a zip archive, so that all the directories are immediately created (data/raw, data/processed, scripts, figures)
also, maybe rename the main dataset to something less generic, something like "human_development" maybe? Then we can have "human_development" and "energy" as the two Gapminder datasets used.

Ideally, "small-and-clean" data should contain fewer variables, but still one of each type, for example:

country, world_region - nomimal
income_group - ordinal
is_oecd - binary (encode as logical)
year, population - discrete
income, children_per_woman, life_expectancy - continuous

This would replace the "2010" only dataset.

The "1960to2010" dataset could contain the same columns (minus "population"), again with no typos/issues, but in addition a few more columns, which could be used in the later cleaning lesson:

population_male and population_female separately - motivates usage of mutate()
main_religion with typos - using stringr package
school_years_men - use "-999" as missing, so we can use ifelse() to solve it
school_years_women - use "-" as missing, we can use as.numeric() to solve it
some other thing that could be used to illustrate separate()?

tavareshugo / r-intro-tidyverse-gapminder

Lesson re-structuring and dataset improvements #6