rfordatascience / tidytuesday

Official repo for the #tidytuesday project
Creative Commons Zero v1.0 Universal
6.76k stars 2.39k forks source link

Baby names 2022 #579

Open tracykteal opened 1 year ago

tracykteal commented 1 year ago

We've probably done a babynames dataset before, but we could potentially do one again with the 2022 data.

https://www.ssa.gov/oact/babynames/limits.html

tracykteal commented 1 year ago

From @simonpcouch we can download the “National Data” which resulted in a zipped folder of .txt files, one for each year. The following script then combines them all into one data frame.

names <- list.files("~/Downloads/names", full.names = TRUE) ​ names_raw <- tibble( year = basename(names), data = map(names, read_csv, col_names = c("name", "sex", "count")) ) ​ baby_names <- names_raw %>% mutate(year = gsub("yob", "", year), year = gsub(".txt", "", year, fixed = TRUE), year = as.numeric(year)) %>% unnest(data) ​