tidyverse / dtplyr

Data table backend for dplyr
https://dtplyr.tidyverse.org
Other
670 stars 57 forks source link

translation of lead/lag function to data.table changes class of column #469

Closed alia-rahim closed 9 months ago

alia-rahim commented 9 months ago

library(dtplyr) library(dplyr, warn.conflicts = FALSE)

data <- data.frame(x = c("A", "A", "B", "B"), y = c(1,2,3,4))

df <- lazy_dt(data) %>% group_by(x) %>% transmute(lead(y,1)) %>% ungroup() %>% collect()

class(df$lead(y, 1))

alia-rahim commented 9 months ago

If you group by and use the lead/lag functions the resulting column becomes a list instead of the input datatype

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

data <- data.frame(x = c("A", "A", "B", "B"), y = c(1,2,3,4))

df <- lazy_dt(data) %>%
  group_by(x) %>%
  transmute(lead(y,1)) %>%
  ungroup() %>%
  collect()

df2 <- data %>%
  group_by(x) %>%
  transmute(lead(y,1)) %>%
  ungroup()

class(df$`lead(y, 1)`)
#> [1] "list"
class(df2$`lead(y, 1)`)
#> [1] "numeric"

Created on 2024-02-22 with reprex v2.1.0

markfairbanks commented 9 months ago

This is actually a data.table issue that has been fixed in their latest development version https://github.com/Rdatatable/data.table/issues/5939

If you update your data.table to the dev version using data.table::update_dev_pkg() it will work correctly