Closed eitsupi closed 2 years ago
It is possible, you are looking for sys_time_info()
. That and naive_time_info()
are the only functions in clock where it makes sense to have a vectorized zone
argument.
Since your original timestamp
values are in UTC, you can convert them straight to sys-time. Then you can use sys_time_info()
on that by also providing your vector of time zones. That gives you a data frame with a lot of information back, but really what you are about is the offset from UTC. Adding that offset to the sys-time gives you the "local time", and it is good practice to then convert that to a naive-time (because it is no longer UTC)
It should be very fast because it is vectorized.
library(dplyr)
library(clock)
df <- readr::read_csv(I("
id,timestamp,timezone
1,2019-01-01T00:00:00Z,UTC
2,2019-01-01T00:00:00Z,Asia/Tokyo
3,2019-01-01T20:00:00Z,UTC
4,2019-01-01T20:00:00Z,Asia/Tokyo
"), show_col_types = FALSE)
df <- df %>%
mutate(sys_time = as_sys_time(timestamp), .keep = "unused")
df
#> # A tibble: 4 × 3
#> id timezone sys_time
#> <dbl> <chr> <clck_sy_>
#> 1 1 UTC 2019-01-01T00:00:00
#> 2 2 Asia/Tokyo 2019-01-01T00:00:00
#> 3 3 UTC 2019-01-01T20:00:00
#> 4 4 Asia/Tokyo 2019-01-01T20:00:00
# All the info you get from `sys_time_info()`.
# You need `offset`.
sys_time_info(df$sys_time, df$timezone)
#> begin end offset dst abbreviation
#> 1 -32767-01-01T00:00:00 32767-12-31T00:00:00 0 FALSE UTC
#> 2 1951-09-08T15:00:00 32767-12-31T00:00:00 32400 FALSE JST
#> 3 -32767-01-01T00:00:00 32767-12-31T00:00:00 0 FALSE UTC
#> 4 1951-09-08T15:00:00 32767-12-31T00:00:00 32400 FALSE JST
df %>%
mutate(
offset = sys_time_info(sys_time, timezone)$offset,
naive_time = as_naive_time(sys_time + offset)
)
#> # A tibble: 4 × 5
#> id timezone sys_time offset naive_time
#> <dbl> <chr> <clck_sy_> <dur<second>> <clck_nv_>
#> 1 1 UTC 2019-01-01T00:00:00 0 2019-01-01T00:00:00
#> 2 2 Asia/Tokyo 2019-01-01T00:00:00 32400 2019-01-01T09:00:00
#> 3 3 UTC 2019-01-01T20:00:00 0 2019-01-01T20:00:00
#> 4 4 Asia/Tokyo 2019-01-01T20:00:00 32400 2019-01-02T05:00:00
Created on 2022-09-13 with reprex v2.0.2
Thanks for the quick and detailed response. This is great! Also, thank you for linking to Stack Overflow. I did a search and saw some older answers but did not get to it.
I am excited about the features of this package, but the many concepts and large number of functions in this package (I was overwhelmed by the length of the reference page......) make it seem difficult for a novice to write such a process.
Is it a non-goal of this package to have such a function?
(For example, is it a prospect to adopt clock
as a backend for a package like lubridate
in the future and implement it in that package?)
df <- readr::read_csv(I("
id,timestamp,timezone
1,2019-01-01T00:00:00Z,UTC
2,2019-01-01T00:00:00Z,Asia/Tokyo
3,2019-01-01T20:00:00Z,UTC
4,2019-01-01T20:00:00Z,Asia/Tokyo
"), show_col_types = FALSE)
.at_time_zone <- function(x, tz) {
x <- clock::as_sys_time(x)
offset <- clock::sys_time_info(x, tz)$offset
clock::as_naive_time(x + offset) |>
as.POSIXct()
}
df |>
dplyr::mutate(
local_timestamp = .at_time_zone(timestamp, timezone)
)
#> # A tibble: 4 × 4
#> id timestamp timezone local_timestamp
#> <dbl> <dttm> <chr> <dttm>
#> 1 1 2019-01-01 00:00:00 UTC 2019-01-01 00:00:00
#> 2 2 2019-01-01 00:00:00 Asia/Tokyo 2019-01-01 09:00:00
#> 3 3 2019-01-01 20:00:00 UTC 2019-01-01 20:00:00
#> 4 4 2019-01-01 20:00:00 Asia/Tokyo 2019-01-02 05:00:00
Created on 2022-09-13 with reprex v2.0.2
The problem is that your local_timestamp
column has a time zone on it that is guaranteed to be wrong.
Assuming that the time zone on that local_timestamp
column is UTC, that is wrong for the 2nd row because that is showing the local time in Asia/Tokyo, not the local time in UTC. That's why I used naive-time as my output type, it is a date-time type with a yet-to-be-specified time zone.
Because there is no way a vector can have multiple time zones, you can't provide a helper for this that returns a POSIXct, so there is no way to push this up into lubridate.
This is a fairly specialized operation, so I'm not worried about it requiring clock
Assuming that the time zone on that
local_timestamp
column is UTC, that is wrong for the 2nd row because that is showing the local time in Asia/Tokyo, not the local time in UTC. That's why I used naive-time as my output type, it is a date-time type with a yet-to-be-specified time zone.
Yes, it is definitely a compromise that that column holds timezone information that shouldn't be there......
Related to tidyverse/lubridate#1063
Thank you for developing this wonderful package. This package seems useful for complex processing with respect to time, but is it possible to convert timestamps with time zone to local time in each region without time zone?
What I would like to do is the following process, but so far we cannot vectorize it.
Created on 2022-09-13 with reprex v2.0.2