markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
449 stars 33 forks source link

`tidytable` fails with `ceiling_date()` function #772

Closed nev-awaken closed 11 months ago

nev-awaken commented 11 months ago

When using the ceiling_date() function from lubridate within a tidytable pipeline, I encounter an error that the date-time class is unsupported. However, a similar pipeline with dtplyr works without any issues.

Example:


library(tidyverse)
library(tidytable)
library(lubridate)

time <- "2023-10-04 9:59:58 AM"
x <- as.POSIXct(time)
timestamp <- c(format(seq(x, by="3 min", length.out=7), "%Y-%m-%d %H:%M:%S"))

len <- length(timestamp)
value <- seq(from = 1, to = 10, length.out = len)
df <- data.table(timestamp = timestamp,
                 value = value)

df$timestamp <- as.POSIXct(df$timestamp)

averaged_df <- df %>%
  group_by(interval = ceiling_date(timestamp, "15 minutes")) %>%
  summarise(average_value = mean(value))

Output:

Error in `tidyselect_locs()`:
! Problem while evaluating `ceiling_date(timestamp, "15 minutes")`.
Caused by error in `unsupported_date_time()`:
! Unsupported date-time class 'character'

Works with dtplyr but not when tidytable package is loaded

markfairbanks commented 11 months ago

This is actually a difference between tidytable and dplyr/dtplyr. You can't create columns on the fly in group_by(). Instead you'll have to use mutate() first.

suppressMessages(library(tidytable))
suppressMessages(library(lubridate))

time <- "2023-10-04 9:59:58 AM"
x <- as.POSIXct(time)
timestamp <- c(format(seq(x, by="3 min", length.out=7), "%Y-%m-%d %H:%M:%S"))

len <- length(timestamp)
value <- seq(from = 1, to = 10, length.out = len)
df <- data.table(timestamp = timestamp,
                 value = value)

df$timestamp <- as.POSIXct(df$timestamp)

df %>%
  mutate(interval = ceiling_date(timestamp, "15 minutes")) %>%
  group_by(interval) %>%
  summarise(average_value = mean(value))
#> # A tidytable: 3 × 2
#>   interval            average_value
#>   <dttm>                      <dbl>
#> 1 2023-10-04 10:00:00           1  
#> 2 2023-10-04 10:15:00           5.5
#> 3 2023-10-04 10:30:00          10

Unlike dplyr you can use tidyselect helpers (like where()/starts_with()/ends_with()) inside group_by()..

df <- tidytable(chr1 = c("a", "a", "b"),
                chr2 = c("a", "a", "b"),
                vals = 1:3)

df %>%
  group_by(where(is.character)) %>%
  summarize(mean_vals = mean(vals))
#> # A tidytable: 2 × 3
#> # Groups:      chr1
#>   chr1  chr2  mean_vals
#>   <chr> <chr>     <dbl>
#> 1 a     a           1.5
#> 2 b     b           3

I made this as a design decision a loooong time ago so unfortunately I can't change it now, even though it does make tidytable operate slightly differently than dplyr.

nev-awaken commented 11 months ago

Thanks for the info