Closed zkamvar closed 5 years ago
This addresses @thibautjombart's private issue that clean_spelling needs regex capabilities. Here is the solution:
library("linelist") # create some fake data my_data <- c(letters[1:5], "foubar", "foobr", "fubar", NA, "", "unknown", "fumar") cleaned_data <- c(letters[1:5], "foobar", "foobar", "foobar", "missing", "missing", "missing", "fumar") # You can use regular expressions to simplify your list corrections <- data.frame( bad = c(".regex f[ou][^m].+?r$", "unknown", ".missing"), good = c("foobar", ".na", "missing"), stringsAsFactors = FALSE ) corrections #> bad good #> 1 .regex f[ou][^m].+?r$ foobar #> 2 unknown .na #> 3 .missing missing data.frame(original = my_data, cleaned = clean_spelling(my_data, corrections)) #> original cleaned #> 1 a a #> 2 b b #> 3 c c #> 4 d d #> 5 e e #> 6 foubar foobar #> 7 foobr foobar #> 8 fubar foobar #> 9 <NA> missing #> 10 missing #> 11 unknown <NA> #> 12 fumar fumar
Created on 2019-05-29 by the reprex package (v0.3.0)
You rock, this is really really really cool. Fanx!
This addresses @thibautjombart's private issue that clean_spelling needs regex capabilities. Here is the solution:
Created on 2019-05-29 by the reprex package (v0.3.0)