tidyverse / lubridate

Make working with dates in R just that little bit easier
https://lubridate.tidyverse.org
GNU General Public License v3.0
731 stars 207 forks source link

daylight saving breaks `+minutes()` #759

Closed Demetrio92 closed 1 year ago

Demetrio92 commented 5 years ago

Adding minutes to a date around daylight saving creates NAs

tsp_initial = ymd_hms('2019-03-09 07:22:03') %>% force_tz('US/Pacific')
tsp_initial + minutes(1160)
## NA

Bug or Feature? It definitely broke my until-now-working code.


FIX:

tsp_initial = ymd_hms('2019-03-09 07:22:03') %>% force_tz('US/Pacific')
tsp_initial + dminutes(1160)
## "2019-03-10 03:42:03 PDT"

Explanation

period and duration act differently when adding a timespan to a date. Consider a simpler example: on the 30th of January what's the date on the calendar in a month from now?

period will tell you it's 30th of February and as this is not possible will produce an NA

ymd_hms("2021-01-30 11:00:00") + months(1)  # NA

duration will add some vague 30 days and return a different date depending on the year.

ymd_hms("2021-01-30 11:00:00") + dmonths(1)  # "2021-03-01 21:30:00 UTC"
ymd_hms("2020-01-30 11:00:00") + dmonths(1)  # "2020-02-29 21:30:00 UTC"

RTFM, I guess

https://lubridate.tidyverse.org/reference/duration.html https://lubridate.tidyverse.org/reference/period.html

Demetrio92 commented 5 years ago

Compared to Postgres behaviour. Just for an example of how this could be handeled.

my-db=> select (
    select (
        select '2019-03-09 07:22:03'::timestamp without time zone at time zone 'US/Pacific'
    ) + interval '1160 mins'
) at time zone 'US/Pacific'
;
      timezone       
---------------------
 2019-03-10 03:42:03
(1 row)

my-db=> select (
    select (
        select '2019-03-09 07:22:03'::timestamp without time zone at time zone 'US/Pacific'
    ) + interval '1160 mins'  - interval '1 hour'
) at time zone 'US/Pacific'
;
      timezone       
---------------------
 2019-03-10 01:42:03

my-db=> select (
    select (
        select '2019-03-09 07:22:03'::timestamp without time zone at time zone 'US/Pacific'
    ) + interval '1160 mins'  + interval '1 hour'
) at time zone 'US/Pacific'
;
      timezone       
---------------------
 2019-03-10 04:42:03
deanm0000 commented 5 years ago

The same thing happens with hours(). It's a new bug because I had code that worked last year that was broken this year. As a workaround I changed my code to something like

with_tz(with_tz(mydate, 'GMT')+hours(1),'America/Los_Angeles')

akleinhesselink commented 5 years ago

I'm finding a similar issue when you try and parse a time that is incompatible with daylight savings time (i.e. 2:00 AM until 3:00 AM doesn't exist when we "spring ahead" in the states). You get an NA and a failed to parse warning. Technically this makes sense but maybe this should be an Error rather than a warning, and maybe a more informative message.

Surprisingly this doesn't break the "strptime" function.

library(lubridate)

test_date <- '03/10/19 02:00 AM'
mdy_hm( test_date, tz = 'America/New_York')
#[1] NA
#Warning message:
# 1 failed to parse. 

Using strptime


d2 <- strptime(test_date, format = '%m/%d/%y %I:%M %p', tz = 'America/New_York')
print( d2 )
# [1] "2019-03-10 02:00:00"

attributes(d2)
# $names
# [1] "sec"    "min"    "hour"   "mday"   "mon"    "year"   "wday"   "yday"   "isdst" 
# [10] "zone"   "gmtoff"
# $class
# [1] "POSIXlt" "POSIXt" 
# $tzone
# [1] "America/New_York" "EST"              "EDT"     

sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] shiny_1.3.0     reprex_0.2.1    lubridate_1.7.4

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        compiler_3.5.3    later_0.8.0       remotes_2.0.3    
 [5] prettyunits_1.0.2 tools_3.5.3       pkgload_1.0.2     digest_0.6.18    
 [9] pkgbuild_1.0.3    jsonlite_1.6      evaluate_0.13     memoise_1.1.0    
[13] rlang_0.3.4       cli_1.1.0         rstudioapi_0.10   yaml_2.2.0       
[17] xfun_0.6          withr_2.1.2       stringr_1.4.0     knitr_1.22       
[21] desc_1.2.0        fs_1.2.7          devtools_2.0.2    rprojroot_1.3-2  
[25] glue_1.3.1        R6_2.4.0          processx_3.3.0    rmarkdown_1.12   
[29] sessioninfo_1.1.1 callr_3.2.0       clipr_0.5.0       magrittr_1.5     
[33] whisker_0.3-2     usethis_1.5.0     backports_1.1.3   ps_1.3.0         
[37] promises_1.0.1    htmltools_0.3.6   rsconnect_0.8.13  assertthat_0.2.1 
[41] mime_0.6          xtable_1.8-3      httpuv_1.5.1      stringi_1.4.3    
[45] miniUI_0.1.1.1    crayon_1.3.4 
vspinu commented 5 years ago

2:AM is borderline. It probably makes sense to parse it actually. I would say this is a bug.

DavisVaughan commented 5 years ago

I just hit this too with <datetime> - hours(1). It was pretty unfortunate, it wrecked a pretty long series spanning multiple years. Crossing over the DST boundary is definitely what did it.

library(lubridate, warn.conflicts = FALSE)

bad_time <- with_tz(as_datetime(1300000380) , "America/New_York")

bad_time
#> [1] "2011-03-13 03:13:00 EDT"

bad_time - hours(1)
#> [1] NA

in_est <- with_tz(as_datetime(1300000380 - 781) , "America/New_York")
in_est
#> [1] "2011-03-13 01:59:59 EST"

dst(bad_time)
#> [1] TRUE
dst(in_est)
#> [1] FALSE

Created on 2019-09-06 by the reprex package (v0.2.1)

DavisVaughan commented 5 years ago

Okay, so I think that there is a bug that has been there since essentially the beginning of the package.

The summary is that I think that right here, there should be another if branch that says if we are ONLY updating using hours/min/sec or lower, then it should use cl_new.pre if the new civil seconds amount is above the old civil seconds amount (i.e. we went forward), otherwise use cl_new.post (we went backwards). https://github.com/tidyverse/lubridate/blob/cd8267976e51bfab9bd43f9eb30f2749d45e4ff9/src/update.cpp#L168

I've read the docs in ?Period-class, and I know you want this reversible property:

date + period - period = date

I would argue that if it the arithmetic only involves hours/minutes/seconds then this is reversible. For my example above, I think that this is perfectly reasonable behavior (lubridate doesn't do this right now):

library(lubridate, warn.conflicts = FALSE)

border <- with_tz(as_datetime(1300000380 - 781) , "America/New_York")
border
#> [1] "2011-03-13 01:59:59 EST"

border + seconds(1)
#> [1] "2011-03-13 03:00:00 EDT"

border + seconds(1) - seconds(1)
#> [1] "2011-03-13 01:59:59 EST"

Created on 2019-09-06 by the reprex package (v0.2.1)

With days, you can't enforce the reversible behavior, so this behavior would result in an NA, not the problematic result shown here (again, this is my custom version)

library(lubridate, warn.conflicts = FALSE)

border <- with_tz(as_datetime(1300000380 - 781) , "America/New_York")
border
#> [1] "2011-03-13 01:59:59 EST"

border_minus_a_day <- border - days(1)
border_minus_a_day
#> [1] "2011-03-12 01:59:59 EST"

one_day_one_second <- days(1) + seconds(1)
one_day_one_second
#> [1] "1d 0H 0M 1S"

# This SHOULD result in an NA
border_minus_a_day + one_day_one_second
#> [1] "2011-03-13 03:00:00 EDT"

# Because this is not reversible
border_minus_a_day + one_day_one_second - one_day_one_second
#> [1] "2011-03-12 02:59:59 EST"

Created on 2019-09-06 by the reprex package (v0.2.1)

Update: Hmm, this wouldn't be reversible either. I'm not sure I have the rule quite right yet.

library(lubridate, warn.conflicts = FALSE)

border <- with_tz(as_datetime(1300000380 - 781) , "America/New_York")
border
#> [1] "2011-03-13 01:59:59 EST"

border_minus_a_day <- border - days(1)
border_minus_a_day
#> [1] "2011-03-12 01:59:59 EST"

one_day_in_hours_one_second <- hours(24) + seconds(1)

border_minus_a_day + one_day_in_hours_one_second
#> [1] "2011-03-13 03:00:00 EDT"

border_minus_a_day + one_day_in_hours_one_second - one_day_in_hours_one_second
#> [1] "2011-03-12 02:59:59 EST"

Created on 2019-09-06 by the reprex package (v0.2.1)

Preexisting rationale for this behavior in Java:

DavisVaughan commented 5 years ago

Hmm okay I think the rule could be if we land in the spring forward daylight savings time gap and (period size) <= (dst gap size) then it is a reversible operation and we should get a real date and not an NA. 99% of the time this means if (period size) <= 1 hour, but there are the rare cases where a dst gap is not 1 hour.

(Note that this would imply that the original issue would still be an NA because minutes(1160) is larger than 1 hour, and would not be reversible)

I would try to implement this, but I can't figure out how to get the size of the dst gap using cctz. The newest version of google's cctz has a method for <time_zone>.next_transition() and .previous_transition() which might could be used to get the borders of the gap. Then the difference would be the size. https://github.com/google/cctz/blob/b4935eef53820cf1643355bb15e013b4167a2867/include/cctz/time_zone.h#L188

library(lubridate, warn.conflicts = FALSE)

one_hour_before_border <- force_tz(as_datetime("2011-03-13 01:00:00") , "America/New_York")
one_hour_before_border
#> [1] "2011-03-13 01:00:00 EST"

# this is what i would expect...
one_hour_before_border + hours(1)
#> [1] "2011-03-13 03:00:00 EDT"

# because this is reversible! good!
one_hour_before_border + hours(1) - hours(1)
#> [1] "2011-03-13 01:00:00 EST"

one_hour_one_second_before_border <- one_hour_before_border - seconds(1)
one_hour_one_second_before_border
#> [1] "2011-03-13 00:59:59 EST"

one_hour_and_one_second <- hours(1) + seconds(1)

# my tweaked version currently gives this, but
# this should actually give NA because...
one_hour_one_second_before_border + one_hour_and_one_second
#> [1] "2011-03-13 03:00:00 EDT"

# ...this is not reversible!
one_hour_one_second_before_border + one_hour_and_one_second - one_hour_and_one_second
#> [1] "2011-03-13 01:59:59 EST"

Created on 2019-09-07 by the reprex package (v0.2.1)

Here is a strange case where there was a 20 minute DST gap. Found from the date library examples

library(lubridate, warn.conflicts = FALSE)

one_second_before_africa_border <- with_tz(as_datetime("1920-08-31 23:59:59"), "Africa/Accra")
one_second_before_africa_border
#> [1] "1920-08-31 23:59:59 GMT"

# this is what i would expect...
one_second_before_africa_border + seconds(1)
#> [1] "1920-09-01 00:20:00 +0020"

# because this is reversible! good!
one_second_before_africa_border + seconds(1) - seconds(1)
#> [1] "1920-08-31 23:59:59 GMT"

twenty_minutes_one_second_before_africa_border <- one_second_before_africa_border - minutes(20)
twenty_minutes_one_second_before_africa_border
#> [1] "1920-08-31 23:39:59 GMT"

twenty_minutes_one_second <- minutes(20) + seconds(1)

# my tweaked version currently gives this, but
# this should actually give NA because...
twenty_minutes_one_second_before_africa_border + twenty_minutes_one_second
#> [1] "1920-09-01 00:20:00 +0020"

# ...this is not reversible!
twenty_minutes_one_second_before_africa_border + twenty_minutes_one_second - twenty_minutes_one_second
#> [1] "1920-08-31 23:59:59 GMT"

Created on 2019-09-07 by the reprex package (v0.2.1)

Demetrio92 commented 5 years ago

Okay, so I think that there is a bug that has been there since essentially the beginning of the package.

The summary is that I think that right here, there should be another if branch that says if we are ONLY updating using hours/min/sec or lower, then it should use cl_new.pre if the new civil seconds amount is above the old civil seconds amount (i.e. we went forward), otherwise use cl_new.post (we went backwards).

https://github.com/tidyverse/lubridate/blob/cd8267976e51bfab9bd43f9eb30f2749d45e4ff9/src/update.cpp#L168

I've read the docs in ?Period-class, and I know you want this reversible property:

date + period - period = date

I would argue that if it the arithmetic only involves hours/minutes/seconds then this is reversible. For my example above, I think that this is perfectly reasonable behavior (lubridate doesn't do this right now):

library(lubridate, warn.conflicts = FALSE)

border <- with_tz(as_datetime(1300000380 - 781) , "America/New_York")
border
#> [1] "2011-03-13 01:59:59 EST"

border + seconds(1)
#> [1] "2011-03-13 03:00:00 EDT"

border + seconds(1) - seconds(1)
#> [1] "2011-03-13 01:59:59 EST"

Created on 2019-09-06 by the reprex package (v0.2.1)

With days, you can't enforce the reversible behavior, so this behavior would result in an NA, not the problematic result shown here (again, this is my custom version)

library(lubridate, warn.conflicts = FALSE)

border <- with_tz(as_datetime(1300000380 - 781) , "America/New_York")
border
#> [1] "2011-03-13 01:59:59 EST"

border_minus_a_day <- border - days(1)
border_minus_a_day
#> [1] "2011-03-12 01:59:59 EST"

one_day_one_second <- days(1) + seconds(1)
one_day_one_second
#> [1] "1d 0H 0M 1S"

# This SHOULD result in an NA
border_minus_a_day + one_day_one_second
#> [1] "2011-03-13 03:00:00 EDT"

# Because this is not reversible
border_minus_a_day + one_day_one_second - one_day_one_second
#> [1] "2011-03-12 02:59:59 EST"

Created on 2019-09-06 by the reprex package (v0.2.1)

Update: Hmm, this wouldn't be reversible either. I'm not sure I have the rule quite right yet.

library(lubridate, warn.conflicts = FALSE)

border <- with_tz(as_datetime(1300000380 - 781) , "America/New_York")
border
#> [1] "2011-03-13 01:59:59 EST"

border_minus_a_day <- border - days(1)
border_minus_a_day
#> [1] "2011-03-12 01:59:59 EST"

one_day_in_hours_one_second <- hours(24) + seconds(1)

border_minus_a_day + one_day_in_hours_one_second
#> [1] "2011-03-13 03:00:00 EDT"

border_minus_a_day + one_day_in_hours_one_second - one_day_in_hours_one_second
#> [1] "2011-03-12 02:59:59 EST"

Created on 2019-09-06 by the reprex package (v0.2.1)

Preexisting rationale for this behavior in Java:

* https://www.javacodex.com/Date-and-Time/Testing-Daylight-Saving-By-Adding-One-Second

seems like a time for a PR!

vspinu commented 5 years ago

This is a bug and it has to do with the fact that arithmetics with periods are not well defined.

I've read the docs in ?Period-class, and I know you want this reversible property:

That rule is unfortunate. There is no solid reason to follow it and it caused a lot of trouble with the package. There is a new lower level package timechange which has a clear defined semantics and allows a full control over how things are rolled both on month change and DST.

> timechange::time_add(time, minutes = 1160)
[1] "2019-03-10 03:42:03 PDT"

I wanted to roll lubridate on top of timechange for at least a year now, but just don't have time. Hopefully by the end of this year.

hadley commented 4 years ago

Minimal reprex:

library(lubridate, warn.conflicts = FALSE)

x <- ymd_hms('2019-03-09 07:22:03', tz = 'US/Pacific')
x + minutes(1160)
#> [1] NA
timechange::time_add(x, minutes = 1160)
#> [1] "2019-03-10 03:42:03 PDT"

Created on 2019-11-19 by the reprex package (v0.3.0)

blakiseskream commented 4 years ago

Adding to this - my pipeline broke last night, this time adding days(1) to 2:00 AM on March 7th. I'm not sure if this should output 2:00 AM on March 8th (which "springs ahead" to 3:00 AM on March 8th) or 3:00 AM on March 8th.

Funny enough, it only seems to affect the 2:00 AM hour.

Again reprex:

library(lubridate)

# 1:59:59 AM
start_date <- ymd_hms('2020-03-07 01:59:59', tz = 'America/Los_Angeles')
start_date + days(1)
#> [1] "2020-03-08 01:59:59 PST"

# 2:00:00 AM
start_date <- ymd_hms('2020-03-07 02:00:00', tz = 'America/Los_Angeles')
start_date + days(1)
#> [1] NA

# 2:59:59 AM
start_date <- ymd_hms('2020-03-07 02:59:59', tz = 'America/Los_Angeles')
start_date + days(1)
#> [1] NA

# 3:00:01 AM
start_date <- ymd_hms('2020-03-07 03:00:01', tz = 'America/Los_Angeles')
start_date + days(1)
#> [1] "2020-03-08 03:00:01 PDT"

Created on 2020-03-08 by the reprex package (v0.3.0)

Upon reflection here, I'll just not base things on the 2:00 AM hour

vspinu commented 4 years ago

The logic here is exactly as with ymd("2009-01-29") + months(1) which results in NA. I am seriously considering changing this unfortunate default to a more desirable behavior, and leaving the current as an opt-in.

robertwwalker commented 3 years ago

Just wanted to reup this. @pitakakariki pointed out, perhaps, a more troublesome case. NA seems like a fine result because it flags troubles. The other direction problem generates an answer that is plainly incorrect.

image

DavisVaughan commented 3 years ago

For anyone doing arithmetic with hours, minutes, or seconds with lubridate, you likely want to use Durations objects (like dhours()) rather than Period objects (like hours()). For arithmetic with more granular units of time like days, months or years, you probably want to stick with Period objects.

So, if using lubridate, my advice for @Demetrio92 and @robertwwalker is to switch to dminutes() and dhours() as that seems to be what you are expecting. That will match your intuition if you were expecting a result equal to "sitting in a chair for 1160 minutes, then looking at the clock" (assuming that your clock automatically adjusts itself for DST).

You could also try using clock. The POSIXct API in clock defaults to the behavior I described in the first paragraph, so it might feel more intuitive. It also errors when landing on a nonexistent time in a DST gap, or on an ambiguous time in a DST fallback, rather than returning NA.

@Demetrio92's original example:

library(clock)
library(lubridate)
library(magrittr)

# This example has to do with DST Gaps (nonexistent times)
x <- as.POSIXct("2019-03-09 07:22:03", tz = "US/Pacific")
x
#> [1] "2019-03-09 07:22:03 PST"

# This is lubridate Period addition:
# - converts to naive (2019-03-09 07:22:03, with no assumed time zone)
# - adds 1160 minutes (2019-03-09 07:22:03 -> 2019-03-10 02:42:03)
# - converts back to US/Pacific, but this is now in a DST gap from
#   01:59:59 -> 03:00:00, so lubridate returns NA
x + minutes(1160)
#> [1] NA

# clock - With POSIXct, hours, minutes, and seconds add like lubridate Durations
add_minutes(x, 1160)
#> [1] "2019-03-10 03:42:03 PDT"

# so clock matches this result of lubridate Duration arithmetic:
x + dminutes(1160)
#> [1] "2019-03-10 03:42:03 PDT"

# To get the lubridate Period behavior in clock if you want it
nt <- x %>%
  as_naive_time() %>%
  add_minutes(1160)

# naive time with no implied time zone - note that this would be in a 
# DST gap if we tried to apply a US/Pacific zone to it
nt
#> <time_point<naive><second>[1]>
#> [1] "2019-03-10 02:42:03"

# unlike lubridate, clock treats this as an error rather than returning NA
as.POSIXct(nt, tz = date_zone(x))
#> Error: Nonexistent time due to daylight saving time at location 1. Resolve nonexistent time issues by specifying the `nonexistent` argument.

# control how to handle this nonexistent time with `nonexistent`,
# use `"NA"` to match lubridate behavior
as.POSIXct(nt, tz = date_zone(x), nonexistent = "roll-forward")
#> [1] "2019-03-10 03:00:00 PDT"
as.POSIXct(nt, tz = date_zone(x), nonexistent = "roll-backward")
#> [1] "2019-03-10 01:59:59 PST"
as.POSIXct(nt, tz = date_zone(x), nonexistent = "NA")
#> [1] NA

@robertwwalker's example:

library(clock)
library(lubridate)
library(magrittr)

# This example has to do with DST Fallbacks (ambiguous times)
x <- as.POSIXct("2021-04-04 02:15:00", tz = "Pacific/Auckland")
x
#> [1] "2021-04-04 02:15:00 NZDT"

# This is lubridate Period addition:
# - converts to naive (2021-04-04 02:15:00, with no assumed time zone)
# - adds one hour (02 -> 03)
# - converts back to Pacific/Auckland with no issues, but the result is
#   potentially not intuitive
x + hours(1)
#> [1] "2021-04-04 03:15:00 NZST"

# clock - With POSIXct, hours, minutes, and seconds add like lubridate Durations
add_hours(x, 1)
#> [1] "2021-04-04 02:15:00 NZST"

# so clock matches this result of lubridate Duration arithmetic:
x + dhours(1)
#> [1] "2021-04-04 02:15:00 NZST"

# To get lubridate Period behavior in clock if you want it
x %>%
  as_naive_time() %>% # this is the key
  add_hours(1) %>%
  as.POSIXct(tz = date_zone(x))
#> [1] "2021-04-04 03:15:00 NZST"
Demetrio92 commented 3 years ago

@DavisVaughan this is awesome! I'm glad there is a reliable fix.

However: dminutes should not act different from minutes. This is confusing at best.

Better: both act the same with a parameter nonexistent changing the behavior.

Long run: deprecate usage of minutes as it involves a lot of autocasting under the hood and will inevitably lead to various issues.


Imagine two programmers wrote together a lot of R code on one project. One of them used minutes the other one dminutes. Some parts of the code produce NAs while other parts don't. Happy debugging.

DavisVaughan commented 3 years ago

However: dminutes should not act different from minutes. This is confusing at best.

I disagree. The whole reason there are two functions for adding minutes is that they work differently. I would encourage you to read the docs for Durations and Periods again, as they are quite different.

Demetrio92 commented 3 years ago

However: dminutes should not act different from minutes. This is confusing at best.

I disagree. The whole reason there are two functions for adding minutes is that they work differently. I would encourage you to read the docs for Durations and Periods again, as they are quite different.

Reading period docs

Within a Period object, time units do not have a fixed length (except for seconds) until they are added to a date-time. ... When math is performed with a period object, each unit is applied separately. How the length of a period is distributed among its units is non-trivial. For example, when leap seconds occur 1 minute is longer than 60 seconds.

Periods track the change in the "clock time" between two date-times. They are measured in common time related units: years, months, days, hours, minutes, and seconds. Each unit except for seconds must be expressed in integer values.

I do not see why this description contradicts my expected output.

Specifically it says stuff like

or example, when leap seconds occur 1 minute is longer than 60 seconds.

My impression here would be that periods are more robust than durations.


I generally agree, the issue is more deep than just two functions that supposed to do the same.

Still, worst case, rename minutes into pminutes.


I am glad period now contains a warning. At least people who carefully study the docs will be aware.

Note: Arithmetic with periods can result in undefined behavior when non-existent dates are involved (such as February 29th in non-leap years). Please see Period for more details and %m+% and add_with_rollback() for alternative operations.

Coming from postgres I just did +minutes() and it worked as expected without any issues until it didn't. I wasn't even aware there were two ways of doing this in lubridate

Demetrio92 commented 3 years ago

TL;DR

ymd("2009-01-29") + months(1) this should probably go to the top of the demos in both docs. Highlights the difference very well.

robertwwalker commented 3 years ago

@DavisVaughan Thank you very much.

pitakakariki commented 3 years ago

Maybe worth noting that this behaviour breaks the reversibility rule:

x <- as.POSIXct("2021-04-04 02:15:00", tz = "Pacific/Auckland")
x
#  [1] "2021-04-04 02:15:00 NZDT"

x + hours(1)
#  [1] "2021-04-04 03:15:00 NZST"

x + hours(1) - hours(1)
#  [1] "2021-04-04 02:15:00 NZST"
pitakakariki commented 3 years ago

I've been thinking (too much probably) about the period-duration distinction.

Imagine the (completely implausible I hope) scenario where the time-lords decide that they're going to do a leap second at the same time as a daylight savings change.

Here's how I'd expect hours and dhours to act:

"2021-04-04 02:15:00 NZDT" + hours(1) == "2021-04-04 02:15:00 NZST"

"2021-04-04 02:15:00 NZDT" + dhours(1) == "2021-04-04 02:14:59 NZST"

On the other hand when it comes to days I have no idea what I should even expect. It doesn't appear to be reversible currently though:

> as.POSIXct("2021-04-04 02:15:00", tz = "Pacific/Auckland")
[1] "2021-04-04 02:15:00 NZDT"
> 
> as.POSIXct("2021-04-04 02:15:00", tz = "Pacific/Auckland") + days(1)
[1] "2021-04-05 02:15:00 NZST"
> 
> as.POSIXct("2021-04-04 02:15:00", tz = "Pacific/Auckland") + days(1) - days(1)
[1] "2021-04-04 02:15:00 NZST"

I'd also like to add that all of this is really annoyingly complex and I'm really glad there are people who are not me making it easier to deal with. So a big thank you to everyone involved in lubridate development.

DavisVaughan commented 3 years ago

Here's how I'd expect hours and dhours to act:

Hmm, @pitakakariki from this example it looks to me like you might misunderstand how Period and Duration are supposed to be working. Let me try explaining. I'll use types from clock, since they allow me to explicitly show the intermediate steps that are happening under the hood.

library(clock)

x <- date_time_parse("2021-04-04 02:15:00", "Pacific/Auckland", ambiguous = "earliest")
x
#> [1] "2021-04-04 02:15:00 NZDT"

# Adding a lubridate "Period" is like:
# - Dropping the original time zone completely
# - Adding the unit of time
# - Adding the original time zone back (if possible)

# Adding a lubridate "Duration" is like:
# - Converting the original time zone to UTC (where DST never affects you)
# - Adding the unit of time
# - Adding the original time zone back (which is always possible)

# We can show these explicitly with clock's naive-time and sys-time types, which
# mimic dropping the original time zone and converting to UTC respectively.
## Period example:

# Notice this has no implied zone attached
nt <- as_naive_time(x)
nt
#> <time_point<naive><second>[1]>
#> [1] "2021-04-04 02:15:00"

# Now we add the time. Since there is no implied time zone, this doesn't
# have to do with the DST fallback whatsoever.
nt_plus_hour <- nt + duration_hours(1)
nt_plus_hour
#> <time_point<naive><second>[1]>
#> [1] "2021-04-04 03:15:00"

# Now we convert back to Pacific/Auckland. Since the 3 o'clock hour is past
# the DST fallback, there is no ambiguity in the conversion, but it may or
# may not be what you want.
as.POSIXct(nt_plus_hour, "Pacific/Auckland")
#> [1] "2021-04-04 03:15:00 NZST"

x + lubridate::hours(1)
#> [1] "2021-04-04 03:15:00 NZST"
## Duration example:

# This is in UTC time, notice how the hour has shifted
st <- as_sys_time(x)
st
#> <time_point<sys><second>[1]>
#> [1] "2021-04-03 13:15:00"

# Now we add the time in UTC. DST can never affect this.
st_plus_hour <- st + duration_hours(1)
st_plus_hour
#> <time_point<sys><second>[1]>
#> [1] "2021-04-03 14:15:00"

# Now convert back to Pacific/Auckland. This just shifts by the appropriate
# UTC offset to get the Pacific/Auckland time.
as.POSIXct(st_plus_hour, "Pacific/Auckland")
#> [1] "2021-04-04 02:15:00 NZST"

x + lubridate::dhours(1)
#> [1] "2021-04-04 02:15:00 NZST"

Created on 2021-04-07 by the reprex package (v1.0.0)

pitakakariki commented 3 years ago

Thanks @DavisVaughan but I think we might be talking at cross purposes. Your comment helps explain the intuition behind the implementation, but my expectations come from the reversibility principal in the documentation.

If adding hours(1) to 2:15am DT and 2:15am ST both give us 3:15am ST, then there's no way to subtract hours(1) from 3:15am ST that's consistent with reversibility.

DavisVaughan commented 3 years ago

As Vitalie mentioned here https://github.com/tidyverse/lubridate/issues/759#issuecomment-536286765, the reversibility rule is probably not the best idea for date time arithmetic, even if it sounds like a good idea (I agree with him now). So I wouldn't base too many assumptions on it.

pitakakariki commented 3 years ago

@DavisVaughan this is an issue tracker, not a help forum. An inconsistency between behaviour and documentation is a bug, which is why I'm reporting these examples. Your insistence that this is somehow a "misunderstanding" on my part is starting to feel quite rude.

hadley commented 3 years ago

@pitakakariki I don't think this conversation is going anywhere useful so I'm going to lock this thread for now.

vspinu commented 1 year ago

I think a real valid issue here is that addition and subtraction do not behave symmetrically with respect to DST transition.

I opened #1088 for this.

Reversibility can never by achieved no matter what we do. I think we should re-export or provide an alternative to timechange::time_add in order for people to have full control over operations with periods. In most cases though working with durations (dhours, dminutes etc) is probably what people actually want.