Closed cwhittaker1000 closed 5 years ago
I'm hesitant to add functionality like this because it provides yet another layer where uncertainty can pop up. Part of the problem is that the vector you get back will be a character vector, not a date vector, and you still have to convert it to date if you want to do anything useful with it. If you want to preserve the old dates, I would suggest to use guess_dates to add a new column instead of modifying the column in place like so:
library("tidyverse")
library("linelist")
(locale <- Sys.getlocale("LC_TIME"))
#> [1] "en_GB.UTF-8"
Sys.setlocale("LC_TIME", "fr_FR.utf8")
#> [1] "fr_FR.utf8"
bloop <- c("40001", "22_Août_2019", "22_Aout_2019", NA)
dat <- tibble::tibble(bloop = bloop, floop = sample(bloop))
DATES <- c("bloop", "floop")
dat %>%
mutate_at(.vars = vars(DATES),
.funs = list(cleaned = ~guess_dates(., error_tolerance=1)))
#> # A tibble: 4 x 4
#> bloop floop bloop_cleaned floop_cleaned
#> <chr> <chr> <date> <date>
#> 1 40001 22_Août_2019 2009-07-07 2019-08-22
#> 2 22_Août_2019 40001 2019-08-22 2009-07-07
#> 3 22_Aout_2019 22_Aout_2019 NA NA
#> 4 <NA> <NA> NA NA
Sys.setlocale("LC_TIME", locale) # reset to original locale
#> [1] "en_GB.UTF-8"
Created on 2019-08-28 by the reprex package (v0.3.0)
@cwhittaker1000, does this work for you?
@zkamvar that works for me!
When working with a linelist containing a mixture of dates in modern excel format and other unrecognised formats e.g.
I noticed that with the error_tolerance parameter turned up and modern_excel = TRUE, "22_Aout_2019" gets coerced to an NA, and the first 3 elements get converted to their correct dates. When the error_tolerance parameter is low, the original vector is returned.
Would it be possible to add some functionality into guess_dates (possibly another argument) to allow a third type of output? Specifically, an output where all the dates that can be converted (i.e. the first three elements in the example above) get converted and returned, and all the inputs that can't be converted (the fourth element in the example above) gets returned as they originally were (instead of being converted to an NA)?