tidyverse / dtplyr

Data table backend for dplyr
https://dtplyr.tidyverse.org
Other
670 stars 57 forks source link

`summarise()` fails with `data.frame` output #405

Closed jmbarbone closed 1 year ago

jmbarbone commented 1 year ago

Using a function inside summarise() that returns a data.frame doesn't seem to be supported:

library(dplyr, warn.conflicts = FALSE)
library(dtplyr)

# example from ?summarise
my_quantile <- function(x, probs) {
  tibble(x = quantile(x, probs), probs = probs)
}

mtcars %>%
  group_by(cyl) %>%
  summarise(my_quantile(disp, c(0.25, 0.75)))
#> `summarise()` has grouped output by 'cyl'. You can override using the `.groups`
#> argument.
#> # A tibble: 6 × 3
#> # Groups:   cyl [3]
#>     cyl     x probs
#>   <dbl> <dbl> <dbl>
#> 1     4  78.8  0.25
#> 2     4 121.   0.75
#> 3     6 160    0.25
#> 4     6 196.   0.75
#> 5     8 302.   0.25
#> 6     8 390    0.75

mtcars %>%
  lazy_dt() |> 
  group_by(cyl) %>%
  summarise(my_quantile(disp, c(0.25, 0.75))) |> 
  collect()
#> Error in `[.data.table`(`_DT1`, , .(`my_quantile(disp, c(0.25, 0.75))` = my_quantile(disp, : All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.

Created on 2023-01-05 with reprex v2.0.2

markfairbanks commented 1 year ago

Thanks for reporting! This is being tracked in https://github.com/tidyverse/dtplyr/issues/342 already. I'm going to close this issue, but you can track there for updates.