Closed cwhittaker1000 closed 5 years ago
This depends on the locale of the current R session. In a French session, the dates would be parsed correctly only if they have the accents correct.
For example:
# In the english locale, neither Aout nor Août are recognised
(locale <- Sys.getlocale("LC_TIME"))
#> [1] "en_GB.UTF-8"
bloop <- c("40001", "22_Août_2019", "22_Aout_2019", NA)
linelist::guess_dates(bloop, error_tolerance = 1, modern_excel = TRUE)
#> [1] "2009-07-07" NA NA NA
# if we set the locale to french, then the one with the accent is recognised by lubridate.
Sys.setlocale("LC_TIME", "fr_FR.utf8")
#> [1] "fr_FR.utf8"
linelist::guess_dates(bloop, error_tolerance = 1, modern_excel = TRUE)
#> [1] "2009-07-07" "2019-08-22" NA NA
As a bonus, here is how the other date parsing packages compare:
as.Date(bloop, "%d_%B_%Y")
#> [1] NA "2019-08-22" NA NA
parsedate::parse_date(bloop)
#> [1] "2019-08-28 10:23:18 UTC" "2019-01-22 00:00:00 UTC"
#> [3] "2019-01-22 00:00:00 UTC" NA
anytime::anydate(bloop)
#> [1] "4000-01-01" NA NA NA
Sys.setlocale("LC_TIME", locale) # reset to original locale
#> [1] "en_GB.UTF-8"
Created on 2019-08-28 by the reprex package (v0.3.0)
Thanks for this and completely understood. Do you have a sense of whether it would be possible to handle French months without accents? I've had a look through lubridate but all of the functionality I've been able to find appears to be contingent on the accents being in place.
the only thing I can think of would be to have a dictionary that replaces these:
stringr::str_replace_all("get Aout", "Aout", "Août")
#> [1] "get Août"
Created on 2019-08-28 by the reprex package (v0.3.0)
That makes perfect sense and is super helpful, thanks a bunch!
guess_dates currently doesn't appear to handle French months well. For example:
works perfectly (converting the first 4 elements to their respective dates and leaving the NA as is), whereas:
doesn't currently and returns the original input vector. There might be some functionality I'm missing out on, but if not, would it be possible to modify guess_dates to incorporate some functionality enabling it to handle French months (both with and without the accents on them) in addition to English months?