tidyverse / lubridate

Make working with dates in R just that little bit easier
https://lubridate.tidyverse.org
GNU General Public License v3.0
728 stars 207 forks source link

mdy strictness #1076

Closed ggrothendieck closed 1 year ago

ggrothendieck commented 1 year ago

Below we find that simply reversing the arguments to coalesce cause the date conversion to fail. Maybe a strict mode that ensures that non strict matches get an NA.

library(lubridate)

d <- c("June 2013", "June 24,2010")

# this works since my(d) returned NA when the format did not match
coalesce(my(d), mdy(d)) # ok
## [1] "2013-06-01" "2010-06-24"
## Warning message:
##  1 failed to parse. 

# simply reversing the arguments causes it to give wrong results
coalesce(mdy(d), my(d)) # wrong
## [1] "2013-06-20" "2010-06-24"
## Warning message:
##  1 failed to parse. 

# the problem is that mdy(d) does not return NA for non-matching components
mdy(d) 
## > mdy(d)
[1] "2013-06-20" "2010-06-24"
vspinu commented 1 year ago

mdy matches 2013 as day = 20 year 2013. In retrospect this was a bad idea, but now we are stuck with it.

Use parse_date_time or parse_date_time2 with explicit mY format:

> parse_date_time(c("June 2013", "June 24,2010"), "mdY")
[1] NA               "2010-06-24 UTC"
Warning message:
 1 failed to parse. 
> parse_date_time(c("June 2013", "June 24,2010"), c("mY", "mdY"))
[1] "2013-06-01 UTC" "2010-06-24 UTC"

I will close this issue but will mark for revisiting if we ever consider rewriting lubriate.

ggrothendieck commented 1 year ago

That gives a POSIXct result and could introduce time zone errors in subsequent processing even though there is no time zone involved. I need a Date result.

On Sun, Oct 30, 2022 at 5:29 PM Vitalie Spinu @.***> wrote:

mdy matches 2013 as day = 20 year 2013. In retrospect this was a bad idea, but now we are stuck with it.

Use parse_date_time or parse_date_time2 with explicit mY format:

parse_date_time(c("June 2013", "June 24,2010"), "mdY") [1] NA "2010-06-24 UTC"Warning message: 1 failed to parse. > parse_date_time(c("June 2013", "June 24,2010"), c("mY", "mdY")) [1] "2013-06-01 UTC" "2010-06-24 UTC"

I will close this issue but will mark for revisiting if we ever consider rewriting lubriate.

— Reply to this email directly, view it on GitHub https://github.com/tidyverse/lubridate/issues/1076#issuecomment-1296355044, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB32F7TBQQL3DJM3NFSOBSDWF3LBLANCNFSM6AAAAAARSOZDLM . You are receiving this because you authored the thread.Message ID: @.***>

-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com

vspinu commented 1 year ago

Convert it with as_date.

vspinu commented 1 year ago

I have created #1077 for adding new argument in order to disallow such cases.

ggrothendieck commented 1 year ago

I am aware that one can convert to Date class but I don't like having intermediate POSIXct results when dealing with dates since there is always the possibility of converting with respect to an unintended time zone. If there were a parse_date then that would be less fragile.

vspinu commented 1 year ago

parse_date_time returns UTC, so there is no accidental conversion there. It's pretty safe I think. Internally mdy does just that, calls parse_date_time and converts to date.

ggrothendieck commented 1 year ago

I have seen too many time zone problems to trust POSIXct with Date data.