This will address #99 to add from and to arguments to clean_variable_spelling() to allow users to import dictionaries with keys and values in any column. This will fix #99
new function linelist_example() to get csv examples
basic documentation
functionality
tests for regression
You can try this out by installing it from this PR:
devtools::install_github("reconhub/linelist#105")
library(linelist)
wordlist <- read.csv(linelist_example("spelling-dictionary.csv"),
stringsAsFactors = FALSE)
dat <- read.csv(linelist_example("coded-data.csv"),
stringsAsFactors = FALSE)
dat$date <- as.Date(dat$date)
wordlist <- wordlist[sample(4)]
wordlist # show the wordlist
#> values grp orders options
#> 1 Yes readmission 1 y
#> 2 No readmission 2 n
#> 3 Unknown readmission 3 u
#> 4 Missing readmission 4 .missing
#> 5 Yes treated 1 0
#> 6 No treated 2 1
#> 7 Missing treated 3 .missing
#> 8 Facility 1 facility 1 1
#> 9 Facility 2 facility 2 2
#> 10 Facility 3 facility 3 3
#> 11 Facility 4 facility 4 4
#> 12 Facility 5 facility 5 5
#> 13 Facility 6 facility 6 6
#> 14 Facility 7 facility 7 7
#> 15 Facility 8 facility 8 8
#> 16 Facility 9 facility 9 9
#> 17 Facility 10 facility 10 10
#> 18 Unknown facility 11 .default
#> 19 0-9 age_group 1 0
#> 20 10-19 age_group 2 10
#> 21 20-29 age_group 3 20
#> 22 30-39 age_group 4 30
#> 23 40-49 age_group 5 40
#> 24 50+ age_group 6 50
#> 25 High .regex ^lab_result_ 1 high
#> 26 Normal .regex ^lab_result_ 2 norm
#> 27 Inconclusive .regex ^lab_result_ 3 inc
#> 28 yes .global Inf y
#> 29 no .global Inf n
#> 30 unknown .global Inf u
#> 31 unknown .global Inf unk
#> 32 yes .global Inf oui
#> 33 missing .global Inf .missing
head(dat) # show the data
#> id date readmission treated facility age_group lab_result_01
#> 1 ef267c 2019-07-08 <NA> 0 C 10 unk
#> 2 e80a37 2019-07-07 y 0 3 10 inc
#> 3 b72883 2019-07-07 y 1 8 30 inc
#> 4 c9ee86 2019-07-09 n 1 4 40 inc
#> 5 40bc7a 2019-07-12 n 1 6 0 norm
#> 6 46566e 2019-07-14 y NA B 50 unk
#> lab_result_02 lab_result_03 has_symptoms followup
#> 1 high inc <NA> u
#> 2 unk norm y oui
#> 3 norm inc oui
#> 4 inc unk y oui
#> 5 unk norm <NA> n
#> 6 unk inc <NA> <NA>
res1 <- clean_variable_spelling(dat,
wordlists = wordlist,
from = "options",
to = "values",
spelling_vars = "grp")
head(res1)
#> id date readmission treated facility age_group lab_result_01
#> 1 ef267c 2019-07-08 missing Yes Unknown 10-19 unknown
#> 2 e80a37 2019-07-07 yes Yes Facility 3 10-19 Inconclusive
#> 3 b72883 2019-07-07 yes No Facility 8 30-39 Inconclusive
#> 4 c9ee86 2019-07-09 no No Facility 4 40-49 Inconclusive
#> 5 40bc7a 2019-07-12 no No Facility 6 0-9 Normal
#> 6 46566e 2019-07-14 yes missing Unknown 50+ unknown
#> lab_result_02 lab_result_03 has_symptoms followup
#> 1 High Inconclusive missing unknown
#> 2 unknown Normal yes yes
#> 3 Normal Inconclusive missing yes
#> 4 Inconclusive unknown yes yes
#> 5 unknown Normal missing no
#> 6 unknown Inconclusive missing missing
This will address #99 to add from and to arguments to
clean_variable_spelling()
to allow users to import dictionaries with keys and values in any column. This will fix #99linelist_example()
to get csv examplesYou can try this out by installing it from this PR:
Created on 2019-12-02 by the reprex package (v0.3.0)