Closed markfairbanks closed 2 years ago
Using lazy_dt
produces this error:
Error in rowSums(list(y = y, z = z)) :
'x' must be an array of at least two dimensions
Not sure the best way to handle this one. At first I thought the way to solve this would be by checking if an input dot was an across
call. If not, capture_dots()
would convert the across()
to a call to a data.table()
call.
Simple example:
dots_fn <- function(...) {
dots <- enquos(...)
is_top_across <- map(dots, quo_is_call, "across")
is_top_across
}
This seems reasonable and works well in the following situations:
df %>%
mutate(
# "top" across call treated normally
across(c(x, y), ~ .x + 1),
# nested within another call - treated as a `data.table()` call
row_sum = rowSums(across(c(y, z)))
)
However I'm finding this approach can fail when dealing with operations applied to list-columns:
library(tidyverse)
list_df <- tibble(a = 1:3, b = 1:3)
df <- tibble(x = c("a", "b"), list_col = list(list_df, list_df))
df %>%
mutate(
list_col = map(list_col, ~ .x %>% mutate(across(c(a, b), ~ .x + 1)))
)
Here we would want the map(mutate())
to treat the across()
as a "top" across()
. But the outermost mutate()
would have already converted it to a call to data.table()
.
I haven't figured out a good workaround yet.
Actually all we would need to do is catch if arrange
/filter
/mutate
/slice
/summarise
/summarize
are called inside of the dot and then return that part of the call unaltered.
Updates from dplyr
Implement pick()
: https://github.com/tidyverse/dplyr/pull/6492
Deprecate across(.fns = NULL)
: https://github.com/tidyverse/dplyr/pull/6523
So we need to implement pick()
. Should we just skip implementing the across()
version since it's being deprecated? Or do we allow it for now since this will probably work in dplyr
for a while?
I think implementing pick()
is a separate issue from this one, because across
output can still be used as a data frame in dplyr even with a non-NULL .fns
library(dplyr, warn.conflicts = FALSE)
df <- tibble(x = c("a", "a", "b"), y = 1:3, z = 1:3)
df %>%
mutate(row_sum = rowSums(across(c(y, z), sin)))
#> # A tibble: 3 × 4
#> x y z row_sum
#> <chr> <int> <int> <dbl>
#> 1 a 1 1 1.68
#> 2 a 2 2 1.82
#> 3 b 3 3 0.282
Created on 2022-11-03 with reprex v2.0.2
But, I don't see a good way to get this behavior in dtplyr. I'd say this issue should be closed unless you feel differently (and open a new one for pick()
)
Oh, didn't see your earlier comments. Seems you have an idea for how to implement, so disregard my comment :)