reconhub / linelist

An R package to import, clean, and store case data
https://www.repidemicsconsortium.org/linelist
Other
25 stars 5 forks source link

guess_dates() can not replace common format #64

Closed ffinger closed 5 years ago

ffinger commented 5 years ago

Formats:

dd_mm_yyyy_HH_MM dd_mm_yyyy_HH_MM_SS

where _ can be any common separator, replaced by _ during cleaning.

Those formats are common in spreadsheets.

Workaround so far:

replace_dates <- function(foo) guess_dates(sub("(_[0-9]{2}){2,3}$", "", foo))
zkamvar commented 5 years ago

Hi Flavio,

I'm on vacation. I'll get to this in a week or so. Also, when you are reporting an issue, please please please give me a reproducible example to work from.

One suggestion I have: try adding dmyhms and dmyhm to the "orders" argument.

On Wed, Mar 27, 2019 at 1:23 AM ffinger notifications@github.com wrote:

Assigned #64 https://github.com/reconhub/linelist/issues/64 to @zkamvar https://github.com/zkamvar.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/reconhub/linelist/issues/64#event-2232251746, or mute the thread https://github.com/notifications/unsubscribe-auth/ADeIlqUhMPRg9SPAz6pHmZeoOiOf8cKtks5vaypogaJpZM4cNMpE .

ffinger commented 5 years ago

Sure, no worries, it's a not an urgent issue. Just wanted to document this, we can work without atm. Here is an example, that also shows that your suggestion works perfectly well (didn't know you could add h, m and s).

> guess_dates("12_01-2018_13-34:32")
[1] "12_01-2018_13-34:32"
> guess_dates("12_01-2018_13-34:32", orders = c("dmyhms"))
[1] "2018-01-12"
> guess_dates("12_01-2018_13-34", orders = c("dmyhms"))
[1] "12_01-2018_13-34"
> guess_dates("12_01-2018_13-34", orders = c("dmyhm"))
[1] "2018-01-12"

Thanks!

zkamvar commented 5 years ago

Hi @ffinger, To clarify a bit more, the orders argument takes a list of format vectors to try in decreasing order of importance.

library("linelist")

# default orders
orders <- list(
  world_named_months = c("Ybd", "dby"), 
  world_digit_months = c("dmy", "Ymd"), 
  US_formats = c("Omdy", "YOmd")
)

x <- c("12_01-2018_13-34:32", "12_01-2018_13-34", "12_01-2018")

guess_dates(x, orders = orders, error_tolerance = 1)
#> [1] NA           NA           "2018-01-12"

Based on your examples, what you would want to do is add the appropriate formats to the vectors like so. Here, we assume that the dates with hms will only appear as dmy

orders2 <- list(
  world_named_months = c("Ybd", "dby"),
  world_digit_months = c("dmy", "Ymd", "dmyhm", "dmyhms"),
  US_formats = c("Omdy", "YOmd")
)

guess_dates(x, orders = orders2, error_tolerance = 1)
#> [1] "2018-01-12" "2018-01-12" "2018-01-12"

You can also just add another vector here instead of appending.

orders3 <- list(
  world_named_months = c("Ybd", "dby"),
  world_digit_months = c("dmy", "Ymd"),
  dmy_with_time = c("dmyhm", "dmyhms"),
  US_formats = c("Omdy", "YOmd")
)

guess_dates(x, orders = orders3, error_tolerance = 1)
#> [1] "2018-01-12" "2018-01-12" "2018-01-12"

Created on 2019-04-05 by the reprex package (v0.2.1)

thibautjombart commented 5 years ago

Can we now close this issue?

ffinger commented 5 years ago

Sure. Maybe we could a hint about this behaviour to the documentation?

zkamvar commented 5 years ago

I have documented it in a separate branch and will make a PR on Monday.

Sent from my iPhone

On Apr 6, 2019, at 11:33, ffinger notifications@github.com wrote:

Sure. Maybe we could a hint about this behaviour to the documentation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.