tidyverse / lubridate

Make working with dates in R just that little bit easier
https://lubridate.tidyverse.org
GNU General Public License v3.0
731 stars 206 forks source link

Changing a characteristic (second) automatically changes the timezone #619

Closed muschellij2 closed 4 years ago

muschellij2 commented 6 years ago

Timezone of the machine has no automatic timezone

Using Sys.timezone, we see that this is NA on one of the machines we work on:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
Sys.timezone()
#> [1] NA

Example data with empty timezone

Here we have some date/times with no timezone initialized:

xdt = dt = structure(
  c(
    1435932005.75171,
    1435932005.76171,
    1435932005.77171,
    1435932005.78171,
    1435932005.79171,
    1435932005.80171
  ),
  class = c("POSIXct",
            "POSIXt"),
  tzone = ""
)
dt
#> [1] "2015-07-03 10:00:05 EDT" "2015-07-03 10:00:05 EDT"
#> [3] "2015-07-03 10:00:05 EDT" "2015-07-03 10:00:05 EDT"
#> [5] "2015-07-03 10:00:05 EDT" "2015-07-03 10:00:05 EDT"
class(dt)
#> [1] "POSIXct" "POSIXt"
tz(dt)
#> [1] ""
second(dt)
#> [1] 5.75171 5.76171 5.77171 5.78171 5.79171 5.80171
class(second(dt))
#> [1] "numeric"

Here we reset the seconds to round it down:

second(dt) = floor(second(dt))
second(dt)
#> [1] 5 5 5 5 5 5

Now, the dt object still has an empty timezone:

tz(dt)
#> [1] ""

but the dt object has the hour changed:

dt
#> [1] "2015-07-03 06:00:05 EDT" "2015-07-03 06:00:05 EDT"
#> [3] "2015-07-03 06:00:05 EDT" "2015-07-03 06:00:05 EDT"
#> [5] "2015-07-03 06:00:05 EDT" "2015-07-03 06:00:05 EDT"
class(dt)
#> [1] "POSIXct" "POSIXt"

second(dt)
#> [1] 5 5 5 5 5 5
class(second(dt))
#> [1] "numeric"

Is this expected behavior?

Timezone set, all is good

When a timezone is set, the hours do not change when the seconds are changed

dt = xdt
tz(dt) = "EST"
dt
#> [1] "2015-07-03 14:00:05 EST" "2015-07-03 14:00:05 EST"
#> [3] "2015-07-03 14:00:05 EST" "2015-07-03 14:00:05 EST"
#> [5] "2015-07-03 14:00:05 EST" "2015-07-03 14:00:05 EST"
class(dt)
#> [1] "POSIXct" "POSIXt"
tz(dt)
#> [1] "EST"
second(dt)
#> [1] 5.75171 5.76171 5.77171 5.78171 5.79171 5.80171
class(second(dt))
#> [1] "numeric"
second(dt) = floor(second(dt))
second(dt)
#> [1] 5 5 5 5 5 5
dt
#> [1] "2015-07-03 14:00:05 EST" "2015-07-03 14:00:05 EST"
#> [3] "2015-07-03 14:00:05 EST" "2015-07-03 14:00:05 EST"
#> [5] "2015-07-03 14:00:05 EST" "2015-07-03 14:00:05 EST"
class(dt)
#> [1] "POSIXct" "POSIXt"
tz(dt)
#> [1] "EST"
second(dt)
#> [1] 5 5 5 5 5 5
class(second(dt))
#> [1] "numeric"
vspinu commented 6 years ago

Thanks for the report. Look like a bug. I will investigate.

For floor and ceiling on date you should use floor_date and ceiling_date. Those are fairly well tested by now.

vspinu commented 6 years ago

I cannot reproduce this.

> (dt <- .POSIXct(1435932005.80171))
[1] "2015-07-03 16:00:05 CEST"
> second(dt) <- floor(second(dt))
> dt
[1] "2015-07-03 16:00:05 CEST"

What version of lubridate is this and what OS?

Why did you report Sys.timzeone() in the first place? It doesn't seem relevant. Or if you setSys.setenv(TZ = "America/New_York") does the problem go away?

muschellij2 commented 6 years ago

The Sys.timzeone() showed that TZ was not set and none of the other areas (such as /etc/timezone) were not setting the time zone in any places. This was on one of our RedHat (2.6.32-696.18.7.el6.x86_64) servers that we were working on.

If TZ is set to a valid time zone, then it goes away. I can't reproduce the error on the cluster machine as we fixed the links to those files.

If TZ is set to something off, then things do break down a bit:

Sys.setenv("TZ" = "not_valid_timezone")
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
xdt = dt = structure(
  c(
    1435932005.75171,
    1435932005.76171,
    1435932005.77171,
    1435932005.78171,
    1435932005.79171,
    1435932005.80171
  ),
  class = c("POSIXct",
            "POSIXt"),
  tzone = ""
)
dt
#> Warning in as.POSIXlt.POSIXct(x, tz): unknown timezone 'not_valid_timezone'
#> [1] "2015-07-03 14:00:05 GMT" "2015-07-03 14:00:05 GMT"
#> [3] "2015-07-03 14:00:05 GMT" "2015-07-03 14:00:05 GMT"
#> [5] "2015-07-03 14:00:05 GMT" "2015-07-03 14:00:05 GMT"
class(dt)
#> [1] "POSIXct" "POSIXt"
tz(dt)
#> [1] "not_valid_timezone"
second(dt)
#> [1] 5.75171 5.76171 5.77171 5.78171 5.79171 5.80171
class(second(dt))
#> [1] "numeric"
second(dt) = floor(second(dt))
#> Error in (function (dt, year, month, yday, mday, wday, hour, minute, second, : Invalid timezone of input vector: "not_valid_timezone"

But again - this is a different issue.

vspinu commented 6 years ago

I guess the RedHat uses something older than lubridate v1.7.0, right? If so, those things don't apply any longer because the underlying (update) functionality has been revamped and based on CCTZ library in 1.7.0.

The second error is kind of expected because tz="" is an alias to the system time-zone which is, in this case, invalid.

akaever commented 6 years ago

I've encountered a similar issue (lubridate v1.7.1) in a docker container (rocker/shiny, but with modified timezone settings). On-the-fly timezone mapping in docker containers is not so trivial (set $TZ, check and map /etc/localtime, /etc/timezone). We ended up with a somewhat inconsistent timezone:

Sys.time()
[1] "2018-01-30 11:11:21 CET"
now()
[1] "2018-01-30 11:11:27 CET"
Sys.timezone()
[1] "UTC"
now() - minutes(1)
[1] "2018-01-30 12:11:07 CET"

The point is that this one hour shift comes quite unexpected. I would not expect any timezone conversion when subtracting/adding fixed times.

akaever commented 6 years ago

Tested in RStudio and Shiny Server, which do not import $TZ. Instead of checking /etc/timezone (which contains the intended timezone string), the symlink /etc/localtime pointing to /usr/share/zoneinfo/Etc/UTC (with replaced contents) was analyzed in Sys.timezone().

vspinu commented 6 years ago

The point is that this one hour shift comes quite unexpected. I would not expect any timezone conversion when subtracting/adding fixed times.

It looks like a bug which probably has to do with the conflicting timezone settings. Why are your timestamps printed in CET if you have Sys.timezone() returning "UTC"?

What is the value of Sys.getenv("TZ")?

> unclass(now()
[1] 1517314542
attr(,"tzone")
[1] ""

Internal lubridate code assumes system timezone when it sees "".

vspinu commented 6 years ago

In order to have a better understanding of what's going on, could you please provide the output of the following:

Sys.timezone()
Sys.getenv("TZ")
tt <- now()
unclass(tt)
tz(tt)
(tt2 <- lubridate:::update_date_time(tt, seconds = 5))
tz(tt2)
unclass(tt2)
tt3 <- tt
second(tt3) <- 5
tt3
unclass(tt3)
tt4 <- tt
minute(tt4) <- 5
tt4
unclass(tt4)
tt5 <- (tt - minutes(1))
unclass(tt5)
akaever commented 6 years ago
> now()
[1] "2018-01-30 14:38:43 CET"
> 
> Sys.timezone()
[1] "UTC"
> Sys.getenv("TZ")
[1] ""
> tt <- now()
> unclass(tt)
[1] 1517319524
attr(,"tzone")
[1] ""
> tz(tt)
[1] ""
> (tt2 <- lubridate:::update_date_time(tt, seconds = 5))
[1] "2018-01-30 14:38:05 CET"
> tz(tt2)
[1] ""
> unclass(tt2)
[1] 1517319485
> tt3 <- tt
> second(tt3) <- 5
> tt3
[1] "2018-01-30 15:38:05 CET"
> unclass(tt3)
[1] 1517323085
attr(,"tzone")
[1] ""
> tt4 <- tt
> minute(tt4) <- 5
> tt4
[1] "2018-01-30 15:05:43 CET"
> unclass(tt4)
[1] 1517321144
attr(,"tzone")
[1] ""
> tt5 <- (tt - minutes(1))
> unclass(tt5)
[1] 1517323064
attr(,"tzone")
[1] ""
akaever commented 6 years ago

The CET comes from an internal conversion to POSIXlt:

> unclass(now())
[1] 1517321219
attr(,"tzone")
[1] ""

> unclass(.Internal(as.POSIXlt(now(), "")))
$sec
[1] 59.49045
$min
[1] 6
$hour
[1] 15
$mday
[1] 30
$mon
[1] 0
$year
[1] 118
$wday
[1] 2
$yday
[1] 29
$isdst
[1] 0
$zone
[1] "CET"
$gmtoff
[1] 3600
attr(,"tzone")
[1] ""     "CET"  "CEST"

Most likely, it's using the contents of /etc/timezone or /etc/localtime (not the path returned by readlink -f).

vspinu commented 6 years ago

Yerh. Looks like a bug in R. The docs for as.POSIXlt state clearly "" is the current time zone and the doc of Sys.timezone says ‘Sys.timezone’ returns the name of the current time zone. So these two should coincide, but they don't on your system. Would you mind reporting it to R folks?

For now, looks like Sys.setenv(TZ="UTC") would fix the problem. I am not sure I can do much on the lubridate side for now, but I have in plan to drop reliance on as.POSIXlt in the near future.

vspinu commented 6 years ago

BTW, there have been recent changes in R-devel regarding TZ settings and caching. May be that the issues is already resolved.

If the TZ environment variable is set when date-time functions are first used, it is recorded as the session default and so will be used rather than the default deduced from the OS if TZ is subsequently unset.

Sys.timezone() on a Unix-alike caches the value at first use in a session: inter alia this means that setting TZ later in the session affects only the current time zone and not the system one. Sys.timezone() is now used to find the system timezone to pass to the code used when R is configured with --with-internal-tzcode. ... Sys.timezone() tries more heuristics on Unix-alikes and so is more likely to succeed (especially on Linux). For the slowest method, a warning is given recommending that TZ is set to avoid the search.

akaever commented 6 years ago

Ok, thanks for checking

vspinu commented 6 years ago

Actually your issue pops up on R-devel by "design". Sys.timezone has been changed to return cached system timezone no matter what.