Closed sousaru closed 6 months ago
Hmm it looks like the issue isn't tidytable::group_by()
and it isn't purrr:reduce()
either.
Part of the issue is in calculate_lags()
you're using dplyr::mutate()
/dplyr::across()
on a grouped tidytable - which doesn't work.
library(tidytable)
df <- data.frame(x = 1:3, y = c("a", "a", "b"))
df %>%
tidytable::group_by(y) %>%
tidytable::mutate(x_lag_correct = lag(x)) %>%
dplyr::mutate(x_lag_wrong = lag(x))
#> # A tidytable: 3 × 4
#> # Groups:
#> x y x_lag_correct x_lag_wrong
#> <int> <chr> <int> <int>
#> 1 1 a NA NA
#> 2 2 a 1 1
#> 3 3 b NA 2
However if you change this to the tidytable versions you would run into another issue - you pass map_lead
to across()
, where map_lead
is a list of functions you are dynamically creating. Unfortunately that doesn't work in tidytable::across()
, and there is no way to make it work given the translation constraints of tidytable::across()
.
Thanks! Should have added the fact that using tidytable for across
/mutate
yields a different error.
map_lead
creates a list of purr-style lambda functions, which I thought worked with tidytable::mutate
(across(...))`. Could you elaborate on why it does not? Is this flagged in the documentation?
Thanks for your time!
A list()
call works, but a list passed as a variable doesn't.
library(tidytable)
df <- tidytable(x = 1:3, y = 4:6)
# Works
df %>%
summarize(
across(c(x, y), list(mean, sum))
)
#> # A tidytable: 1 × 4
#> x_1 x_2 y_1 y_2
#> <dbl> <int> <dbl> <int>
#> 1 2 6 5 15
# Fails
funs <- list(mean, sum)
df %>%
summarize(
across(c(x, y), funs)
)
#> Error in funs(x): could not find function "funs"
I think you're right I need to document that difference somewhere.
Could you elaborate on why it does not?
Essentially the idea of tidytable::across()
is it gets expanded to a regular function call - no evaluation occurs until the expansion is completed.
So this...
df %>%
summarize(across(c(x, y), mean))
becomes this...
df %>%
summarize(x = mean(x),
y = mean(y))
To get this to work you have to account for 3 cases
.fns = mean
assumes a function name and gets converted to mean(col)
.fns = ~ mean(.x)
replaces the .x
and gets converted to mean(col)
.fns = list(mean, max)
combines the assumptions from the other 2 - either straight function conversions or converting a lambda function.Since I'm directly altering the expressions in the background it's tough to know what the intent is when passing a list of functions.
I was having a bug while using
purrr::reduce
to compute lags in a data.frame and realized it came down to the use oftidytable::group_by
.I provide a MRE: