Open matiasandina opened 10 months ago
The point of date_group()
when used at "minute"
precision is to group by n
minutes within the current hour.
i.e. the "counter" which decides how to break up the groups resets at every hour.
Here is one of the Examples that tries to demonstrate this
x <- as.POSIXct("2019-01-01", "America/New_York")
x <- add_days(x, -3:5)
# Group by 2 days of the current month.
# Note that this resets at the beginning of the month, creating day groups
# of [29, 30] [31] [01, 02] [03, 04].
date_group(x, "day", n = 2)
If you just want to create buckets of 120 minutes anchored from an initial origin of 1970-01-01 00:00:00
then date_floor()
is the right function for you.
date_floor()
and date_group()
work via very different algorithms for very different purposes
https://clock.r-lib.org/articles/clock.html#grouping-rounding-and-shifting
Got it. My package function is a glorified wrapper of date_group()
that basically bins before processing:
data <- data %>%
dplyr::mutate(bin = clock::date_group({{time_col}}, n = n, precision = precision))
I guess they should not change from minutes to hours like it's no big deal hoping for internal conversion. I am not sure whether edge cases should favor date_group()
or floor_date()
and it's a bit tough to think about the different use cases for the binning function (i.e., what do they expect when binning? should they try to use the same function to bin minutes, hours, and days?). It's likely that the floor_date()
is the most intuitive behavior here, but I'm a bit uncertain because the design choice was to give freedom to the user (they can select whatever bin with one funtion). Unfortunately, they can come up with bins that produce different behavior:
x <- as.POSIXct("2019-01-01 23:00:00", "America/New_York")
x <- clock::add_minutes(x, seq(-119, 121, 18))
x
[1] "2019-01-01 21:01:00 EST" "2019-01-01 21:19:00 EST" "2019-01-01 21:37:00 EST" "2019-01-01 21:55:00 EST"
[5] "2019-01-01 22:13:00 EST" "2019-01-01 22:31:00 EST" "2019-01-01 22:49:00 EST" "2019-01-01 23:07:00 EST"
[9] "2019-01-01 23:25:00 EST" "2019-01-01 23:43:00 EST" "2019-01-02 00:01:00 EST" "2019-01-02 00:19:00 EST"
[13] "2019-01-02 00:37:00 EST" "2019-01-02 00:55:00 EST"
# using `n` = 25 because 25 is not a divisor of 60
# intuitively weird that the first block is 20:45 and not 21:00
clock::date_floor(x, "minute", n = 25)
[1] "2019-01-01 20:45:00 EST" "2019-01-01 21:10:00 EST" "2019-01-01 21:35:00 EST" "2019-01-01 21:35:00 EST"
[5] "2019-01-01 22:00:00 EST" "2019-01-01 22:25:00 EST" "2019-01-01 22:25:00 EST" "2019-01-01 22:50:00 EST"
[9] "2019-01-01 23:15:00 EST" "2019-01-01 23:40:00 EST" "2019-01-01 23:40:00 EST" "2019-01-02 00:05:00 EST"
[13] "2019-01-02 00:30:00 EST" "2019-01-02 00:55:00 EST"
# This is behavior might make more sense in this particular case
clock::date_group(x, "minute", n = 25)
[1] "2019-01-01 21:00:00 EST" "2019-01-01 21:00:00 EST" "2019-01-01 21:25:00 EST" "2019-01-01 21:50:00 EST"
[5] "2019-01-01 22:00:00 EST" "2019-01-01 22:25:00 EST" "2019-01-01 22:25:00 EST" "2019-01-01 23:00:00 EST"
[9] "2019-01-01 23:25:00 EST" "2019-01-01 23:25:00 EST" "2019-01-02 00:00:00 EST" "2019-01-02 00:00:00 EST"
[13] "2019-01-02 00:25:00 EST" "2019-01-02 00:50:00 EST"
I am not sure if this is by design or not, but I really like the behavior of
date_group()
and it just exploded in my face when trying to use it with more than 60 minutes. Here's a reproducible example:It would be nice for
clock
to automatically perform the conversion. If difficult to change, I think there should be some sort of verbose warning to the user, since the grouping gets performed silently and gives the illusion of it working properly.This gets properly handled as I would expect with
date_floor
(conversion being made), so I'm not sure ifdate_group()
is doing what it is supposed to be doing and I should be usingdate_floor
instead.