#
# The default orders prioritize world date ordering over American-style.
print(ord <- getOption("linelist_guess_orders"))
#> $world_named_months
#> [1] "Ybd" "dby"
#>
#> $world_digit_months
#> [1] "dmy" "Ymd"
#>
#> $US_formats
#> [1] "Omdy" "YOmd"
# if you want to prioritize American-style dates with numeric months, you
# can switch the second and third elements of the default orders
print(ord <- getOption("linelist_guess_orders"))
#> $world_named_months
#> [1] "Ybd" "dby"
#>
#> $world_digit_months
#> [1] "dmy" "Ymd"
#>
#> $US_formats
#> [1] "Omdy" "YOmd"
print(us_ord <- ord[c(1, 3, 2)])
#> $world_named_months
#> [1] "Ybd" "dby"
#>
#> $US_formats
#> [1] "Omdy" "YOmd"
#>
#> $world_digit_months
#> [1] "dmy" "Ymd"
guess_dates(c("03 Jan 2018", "07/03/1982", "08/20/85"), orders = us_ord)
#> [1] "2018-01-03" "1982-07-03" "1985-08-20"
Handling dates with time formats ————————–
This one is for @ffinger addressing #64
#
# If you have a format with hours, minutes and seconds, you can also add that
# to the list of formats. Note, however, that this function will drop levels
# below day.
print(ord$ymdhms <- c("Ymdhms", "Ymdhm"))
#> [1] "Ymdhms" "Ymdhm"
guess_dates(c("2014_04_05_23:15:43", "03 Jan 2018", "07/03/1982", "08/20/85"), orders = ord)
#> [1] "2014-04-05" "2018-01-03" "1982-03-07" "1985-08-20"
Handling missing and nonsense data ———————–
@thibautjombart, you can see in this section, I've added an the Excel date for 2018-10-16 addressing #66 and #6
#
# guess_dates can handle messy dates and tolerate missing data
x <- c("01-12-2001", "male", "female", "2018-10-18", NA, NA, "2018_10_17",
"43387", "2018 10 19", "// 24/12/1989", "this is 24/12/1989!",
"RECON NGO: 19 Sep 2018 :)", "6/9/11", "10/10/10")
guess_dates(x, error_tolerance = 1) # forced conversion
#> [1] "2001-12-01" NA NA "2018-10-18" NA
#> [6] NA "2018-10-17" "2018-10-16" "2018-10-19" "1989-12-24"
#> [11] "1989-12-24" "2018-09-19" "2011-09-06" "2010-10-10"
guess_dates(x, error_tolerance = 0.15) # only 15% errors allowed
#> [1] "01-12-2001" "male"
#> [3] "female" "2018-10-18"
#> [5] NA NA
#> [7] "2018_10_17" "43387"
#> [9] "2018 10 19" "// 24/12/1989"
#> [11] "this is 24/12/1989!" "RECON NGO: 19 Sep 2018 :)"
#> [13] "6/9/11" "10/10/10"
This will fix #64, fix #65, and fix #66 and address #6 with the following improvements:
guess_dates()
can now handle dates that were imported from Excel as integers (#66).guess_dates()
gains the argument "modern_excel" to indicate how integer dates should be formatted.getOption("linelist_guess_orders")
replaces the explicit list of orders inguess_dates()
for easier access.guess_dates()
no longer throws an error if passed a date class object (#65).guess_dates()
has been better documented to reflect the above changes (#64).Here is the new example documentation:
Mixed format date —————————————–
Prioritizing specific date formats ————————
Handling dates with time formats ————————–
This one is for @ffinger addressing #64
Handling missing and nonsense data ———————–
@thibautjombart, you can see in this section, I've added an the Excel date for 2018-10-16 addressing #66 and #6
Created on 2019-04-08 by the reprex package (v0.2.1)