Open tavareshugo opened 4 years ago
Ideally, "small-and-clean" data should contain fewer variables, but still one of each type, for example:
country
, world_region
- nomimalincome_group
- ordinalis_oecd
- binary (encode as logical)year
, population
- discreteincome
, children_per_woman
, life_expectancy
- continuousThis would replace the "2010" only dataset.
The "1960to2010" dataset could contain the same columns (minus "population"), again with no typos/issues, but in addition a few more columns, which could be used in the later cleaning lesson:
population_male
and population_female
separately - motivates usage of mutate()
main_religion
with typos - using stringr
packageschool_years_men
- use "-999" as missing, so we can use ifelse()
to solve itschool_years_women
- use "-" as missing, we can use as.numeric()
to solve itseparate()
?
dplyr
(select
,rename
,mutate
,arrange
,filter
,distinct
)summarise
,group_by
,count
)write_csv()
at the end (stringr
,factor
, and possiblyseparate
andunite
)pivot_*
)ggthemer
) and also assembling plots withpatchwork
(now on CRAN). More ideas on #3.For this to work, then it would help to revise the datasets:
data/raw
,data/processed
,scripts
,figures
)