Closed swnydick closed 2 years ago
You were nearly there:
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
db <- memdb_frame(iris)
# Note: a list of characters (currently) produces ugly names
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list("mean", "sd")))
#> Warning: Missing values are always removed in SQL.
#> Use `mean(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> Warning: Missing values are always removed in SQL.
#> Use `sd(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.35.5 [:memory:]
#> `Sepal.Length_"mean"` `Sepal.Length_"sd"`
#> <dbl> <dbl>
#> 1 5.84 0.828
# and they are different for local data frames
iris %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list("mean", "sd")))
#> Sepal.Length_1 Sepal.Length_2
#> 1 5.843333 0.8280661
# you need to explicitly use `list()`
st <- list("mean", "sd")
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list(!!!st)))
#> Warning: Missing values are always removed in SQL.
#> Use `sd(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.35.5 [:memory:]
#> `Sepal.Length_"mean"` `Sepal.Length_"sd"`
#> <dbl> <dbl>
#> 1 5.84 0.828
# nicer names when using symbols instead of character
st2 <- lapply(st, sym)
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list(!!!st2)))
#> Warning: Missing values are always removed in SQL.
#> Use `sd(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.35.5 [:memory:]
#> Sepal.Length_mean Sepal.Length_sd
#> <dbl> <dbl>
#> 1 5.84 0.828
# or simply use a character vector
st_char <- unlist(st)
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = !!st_char))
#> Warning: Missing values are always removed in SQL.
#> Use `sd(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.35.5 [:memory:]
#> Sepal.Length_mean Sepal.Length_sd
#> <dbl> <dbl>
#> 1 5.84 0.828
Created on 2021-06-11 by the reprex package (v2.0.0)
Thanks for the work around. The character list was just a very simplified example for reprex purposes (I noticed the odd names, but it generally seems like fewer lines of code is better for reprex, so I ignored that because it wasn't really important here). That said, the comment was less "I could find a work around" than "the behavior is inconsistent". If anything, the error is incredibly misleading in the case of unquoting an assigned list.
#> Error: `.fns` argument to dbplyr::across() must be a NULL, a function name, formula, or list
Ideally using NSE and unquoting SE should result in the same behavior in most situations (although there are obvious exceptions, such as needing to use symbols for lazy evaluation in SE rather than using the function call, especially with dbplyr where the names are translated). But I couldn't see any reason why
st <- c("mean", "sd")
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list("mean", "sd")))
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = !!st))
would result in different behavior. In any case, I even went the base-R way to see if that would work (ignoring known environment issues with the base R eval/substitute), but that caused exactly the same problem.
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
db <- memdb_frame(iris)
st <- syms(list("mean", "sd"))
expr <- substitute(db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = stats)), list(stats = st))
expr
#> db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list(
#> mean, sd)))
eval(expr)
#> Error: `.fns` argument to dbplyr::across() must be a NULL, a function name, formula, or list
Created on 2021-06-11 by the reprex package (v2.0.0)
So across/across_funs is forcing .fns = list() explicitly ("list" must be NSE and won't work with substitution, even if the stuff inside can be unquoted SE, which your work around showed), which seems antithetical to the purpose of tidyeval.
I am trying to write an API where the user can specify one or more aggregation functions to apply to one or more columns of a database. Therefore, the function must be a programmed argument and ideally can be a list of character strings. Running this on the tbl itself works (see below), but running it on the database (sqlite, mysql, etc.) errored. There might be a work-around (although I haven't figured it out), but this behavior still feels unintended.
Created on 2021-06-10 by the reprex package (v2.0.0)