Open eipi10 opened 1 year ago
Minimal reprex
1.0.10:
# pak::pak("tidyverse/dplyr@v1.0.10")
library(dplyr, warn.conflicts = FALSE)
df <- tibble(x = 1:5, w = 2:6)
fn <- function(data, cols, fns) {
summarise(data, across(.cols = {{cols}}, .fns = fns))
}
# Works from top level
summarise(df, across(x, ~weighted.mean(.x, w = w)))
#> # A tibble: 1 × 1
#> x
#> <dbl>
#> 1 3.5
# Works when wrapped
fn(df, x, ~weighted.mean(.x, w = w))
#> # A tibble: 1 × 1
#> x
#> <dbl>
#> 1 3.5
1.1.1:
library(dplyr, warn.conflicts = FALSE)
df <- tibble(x = 1:5, w = 2:6)
fn <- function(data, cols, fns) {
summarise(data, across(.cols = {{cols}}, .fns = fns))
}
# Works from top level
summarise(df, across(x, ~weighted.mean(.x, w = w)))
#> # A tibble: 1 × 1
#> x
#> <dbl>
#> 1 3.5
# Not when wrapped
fn(df, x, ~weighted.mean(.x, w = w))
#> Error in `summarise()`:
#> ℹ In argument: `across(.cols = x, .fns = fns)`.
#> Caused by error in `across()`:
#> ! Can't compute column `x`.
#> Caused by error in `weighted.mean.default()`:
#> ! object 'w' not found
Possible solution proposed by @lionel- is to allow .fns = {{ fns }}
so that users can wrap with a pattern like:
fn <- function(data, cols, fns) {
summarise(data, across(.cols = {{cols}}, .fns = {{fns}}))
}
The justification here being that if across()
is a true templating function then it needs to be able to access the original expressions for .fns
, so they need to come through with {{
Actually, that already works (assuming I'm understanding what you and @lionel- had in mind):
library(tidyverse)
fn <- function(data, cols, fns, groups=NULL) {
data %>%
group_by(across({{groups}})) %>%
summarise(across(.cols = {{cols}}, .fns = {{fns}}))
}
d = tibble(
x1=1:5, x2=11:15, w=2:6, g=rep(LETTERS[1:2], c(2,3))
)
fn(d,
cols=c(x1,x2),
fns=c(mean=mean, mean.wt=~weighted.mean(., w=w)),
groups=g)
#> # A tibble: 2 × 5
#> g x1_mean x1_mean.wt x2_mean x2_mean.wt
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 A 1.5 1.6 11.5 11.6
#> 2 B 4 4.13 14 14.1
fn(mtcars,
cols=c(mpg, hp),
fns=c(mean=mean, mean.wt=~weighted.mean(., w=cyl)),
groups=c(am, vs))
#> `summarise()` has grouped output by 'am'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 6
#> # Groups: am [2]
#> am vs mpg_mean mpg_mean.wt hp_mean hp_mean.wt
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0 15.0 15.1 194. 194.
#> 2 0 1 20.7 20.4 102. 105.
#> 3 1 0 19.8 19.0 181. 198.
#> 4 1 1 28.4 28.4 80.6 80.6
Created on 2023-03-31 with reprex v2.0.2
Oh, but it doesn't work if you pass a separate object as the fns
argument:
library(tidyverse)
fn <- function(data, cols, fns, groups=NULL) {
data %>%
group_by(across({{groups}})) %>%
summarise(across(.cols = {{cols}}, .fns = {{fns}}))
}
fn(mtcars,
cols=c(mpg, hp),
fns=c(mean=mean, mean.wt=~weighted.mean(., w=cyl)),
groups=c(am, vs))
#> `summarise()` has grouped output by 'am'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 6
#> # Groups: am [2]
#> am vs mpg_mean mpg_mean.wt hp_mean hp_mean.wt
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0 15.0 15.1 194. 194.
#> 2 0 1 20.7 20.4 102. 105.
#> 3 1 0 19.8 19.0 181. 198.
#> 4 1 1 28.4 28.4 80.6 80.6
FUNS = c(mean=mean, mean.wt=~weighted.mean(., w=cyl))
fn(mtcars,
cols=c(mpg, hp),
fns=FUNS,
groups=c(am, vs))
#> Error in `summarise()`:
#> ℹ In argument: `across(.cols = c(mpg, hp), .fns = FUNS)`.
#> ℹ In group 1: `am = 0`, `vs = 0`.
#> Caused by error in `across()`:
#> ! Can't compute column `mpg_mean.wt`.
#> Caused by error in `weighted.mean.default()`:
#> ! object 'cyl' not found
#> Backtrace:
#> ▆
#> 1. ├─global fn(mtcars, cols = c(mpg, hp), fns = FUNS, groups = c(am, vs))
#> 2. │ └─data %>% group_by(across({{ groups }})) %>% ...
#> 3. ├─dplyr::summarise(...)
#> 4. ├─dplyr:::summarise.grouped_df(...)
#> 5. │ └─dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
#> 6. │ ├─base::withCallingHandlers(...)
#> 7. │ └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
#> 8. │ └─base::lapply(.x, .f, ...)
#> 9. │ └─dplyr (local) FUN(X[[i]], ...)
#> 10. │ ├─base::withCallingHandlers(...)
#> 11. │ └─mask$eval_all_summarise(quo)
#> 12. │ └─dplyr (local) eval()
#> 13. ├─global `<rlng_lm_>`(mpg)
#> 14. │ ├─stats::weighted.mean(., w = cyl)
#> 15. │ └─stats:::weighted.mean.default(., w = cyl)
#> 16. └─base::.handleSimpleError(...)
#> 17. └─dplyr (local) h(simpleError(msg, call))
#> 18. └─rlang::abort(msg, call = call("across"), parent = cnd)
Created on 2023-04-01 with reprex v2.0.2
That probably can't and won't ever work because we can't "see" the expression that built the original object, we only see FUNS
As a result of my incomplete understanding of how NSE might interact with different ways of passing arguments, I failed to include a separate FUNS
object as an example in my initial post.
I just want to point out that in dplyr 1.0.10
you can pass a separate FUNS
object into a summary function, without using embrasure, and the summary function works, even when you pass additional columns inside one or more of the functions within FUNS
, such as the w
argument in weighted.mean
. But this approach fails in dplyr 1.1.1
. Because I do this often, I ran into this problem almost immediately after I installed 1.1.1. Below are reproducible examples with 1.0.10 and 1.1.1.
You can pass the .fns argument explicitly if you use embrasure, as in my post above, but how can I make the FUNS
example below work in 1.1.1 as it does in 1.0.10 (preferably in a way that also works with an explicit .fns argument)?
dplyr 1.0.10
: Passing an (unembraced) .fns
(directly or as an object) workslibrary(tidyverse)
fn <- function(data, cols, fns, groups=NULL) {
data %>%
group_by(across({{groups}})) %>%
summarise(across(.cols = {{cols}}, .fns = fns))
}
fn(mtcars,
cols=c(mpg, hp),
fns=c(mean=mean, mean.wt=~weighted.mean(., w=cyl)),
groups=c(am, vs))
#> `summarise()` has grouped output by 'am'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 6
#> # Groups: am [2]
#> am vs mpg_mean mpg_mean.wt hp_mean hp_mean.wt
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0 15.0 15.1 194. 194.
#> 2 0 1 20.7 20.4 102. 105.
#> 3 1 0 19.8 19.0 181. 198.
#> 4 1 1 28.4 28.4 80.6 80.6
FUNS = c(mean=mean, mean.wt=~weighted.mean(., w=cyl))
fn(mtcars,
cols=c(mpg, hp),
fns=FUNS,
groups=c(am, vs))
#> `summarise()` has grouped output by 'am'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 6
#> # Groups: am [2]
#> am vs mpg_mean mpg_mean.wt hp_mean hp_mean.wt
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0 15.0 15.1 194. 194.
#> 2 0 1 20.7 20.4 102. 105.
#> 3 1 0 19.8 19.0 181. 198.
#> 4 1 1 28.4 28.4 80.6 80.6
Created on 2023-04-01 with reprex v2.0.2
dplyr 1.1.1
: Passing an (unembraced) .fns
(directly or as an object) failslibrary(tidyverse)
fn <- function(data, cols, fns, groups=NULL) {
data %>%
group_by(across({{groups}})) %>%
summarise(across(.cols = {{cols}}, .fns = fns))
}
fn(mtcars,
cols=c(mpg, hp),
fns=c(mean=mean, mean.wt=~weighted.mean(., w=cyl)),
groups=c(am, vs))
#> Error in `summarise()`:
#> ℹ In argument: `across(.cols = c(mpg, hp), .fns = fns)`.
#> ℹ In group 1: `am = 0`, `vs = 0`.
#> Caused by error in `across()`:
#> ! Can't compute column `mpg_mean.wt`.
#> Caused by error in `weighted.mean.default()`:
#> ! object 'cyl' not found
#> Backtrace:
#> ▆
#> 1. ├─global fn(...)
#> 2. │ └─data %>% group_by(across({{ groups }})) %>% ...
#> 3. ├─dplyr::summarise(...)
#> 4. ├─dplyr:::summarise.grouped_df(...)
#> 5. │ └─dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
#> 6. │ ├─base::withCallingHandlers(...)
#> 7. │ └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
#> 8. │ └─base::lapply(.x, .f, ...)
#> 9. │ └─dplyr (local) FUN(X[[i]], ...)
#> 10. │ ├─base::withCallingHandlers(...)
#> 11. │ └─mask$eval_all_summarise(quo)
#> 12. │ └─dplyr (local) eval()
#> 13. ├─global `<rlng_lm_>`(mpg)
#> 14. │ ├─stats::weighted.mean(., w = cyl)
#> 15. │ └─stats:::weighted.mean.default(., w = cyl)
#> 16. └─base::.handleSimpleError(...)
#> 17. └─dplyr (local) h(simpleError(msg, call))
#> 18. └─rlang::abort(msg, call = call("across"), parent = cnd)
FUNS = c(mean=mean, mean.wt=~weighted.mean(., w=cyl))
fn(mtcars,
cols=c(mpg, hp),
fns=FUNS,
groups=c(am, vs))
#> Error in `summarise()`:
#> ℹ In argument: `across(.cols = c(mpg, hp), .fns = fns)`.
#> ℹ In group 1: `am = 0`, `vs = 0`.
#> Caused by error in `across()`:
#> ! Can't compute column `mpg_mean.wt`.
#> Caused by error in `weighted.mean.default()`:
#> ! object 'cyl' not found
#> Backtrace:
#> ▆
#> 1. ├─global fn(mtcars, cols = c(mpg, hp), fns = FUNS, groups = c(am, vs))
#> 2. │ └─data %>% group_by(across({{ groups }})) %>% ...
#> 3. ├─dplyr::summarise(...)
#> 4. ├─dplyr:::summarise.grouped_df(...)
#> 5. │ └─dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
#> 6. │ ├─base::withCallingHandlers(...)
#> 7. │ └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
#> 8. │ └─base::lapply(.x, .f, ...)
#> 9. │ └─dplyr (local) FUN(X[[i]], ...)
#> 10. │ ├─base::withCallingHandlers(...)
#> 11. │ └─mask$eval_all_summarise(quo)
#> 12. │ └─dplyr (local) eval()
#> 13. ├─global `<rlng_lm_>`(mpg)
#> 14. │ ├─stats::weighted.mean(., w = cyl)
#> 15. │ └─stats:::weighted.mean.default(., w = cyl)
#> 16. └─base::.handleSimpleError(...)
#> 17. └─dplyr (local) h(simpleError(msg, call))
#> 18. └─rlang::abort(msg, call = call("across"), parent = cnd)
Created on 2023-04-01 with reprex v2.0.2
I believe that the new behavior was introduced in https://github.com/tidyverse/dplyr/pull/6550 IMO, the new behavior is better, it removes ambiguities. For example, in the above example,
fn <- function(data, cols, fns, groups=NULL) {
data %>%
group_by(across({{groups}})) %>%
summarise(across(.cols = {{cols}}, .fns = fns))
}
fn(mtcars,
cols=c(mpg, hp),
fns=c(mean=mean, mean.wt=~weighted.mean(., w=cyl)),
groups=c(am, vs))
It is unclear if cyl
should be from the data frame or a global variable.
I would expect cyl
to be picked up from the data frame in that example, consistently with dplyr semantics. This can be achieved by interpolating .fns
with .fns = {{ .fns }}
. We'll test and document this as an official pattern.
@lionel- the code below fails in dplyr 1.1.1
even though it uses embracing operator, so I think I'm not understanding your previous comment. Is there a different pattern I should be using to pass FUNS
into a summarizing function in a way that will work with weighted.mean
(or other functions that similarly require ancillary columns to be passed into the .fns
argument)?
library(tidyverse)
fn <- function(data, cols, fns, groups=NULL) {
data %>%
group_by(across({{groups}})) %>%
summarise(across(.cols = {{cols}}, .fns = {{fns}}))
}
FUNS = c(mean=mean, mean.wt=~weighted.mean(., w=cyl))
fn(mtcars,
cols=c(mpg, hp),
fns=FUNS,
groups=c(am, vs))
#> Error in `summarise()`:
#> ℹ In argument: `across(.cols = c(mpg, hp), .fns = FUNS)`.
#> ℹ In group 1: `am = 0`, `vs = 0`.
#> Caused by error in `across()`:
#> ! Can't compute column `mpg_mean.wt`.
#> Caused by error in `weighted.mean.default()`:
#> ! object 'cyl' not found
#> Backtrace:
#> ▆
#> 1. ├─global fn(mtcars, cols = c(mpg, hp), fns = FUNS, groups = c(am, vs))
#> 2. │ └─data %>% group_by(across({{ groups }})) %>% ...
#> 3. ├─dplyr::summarise(...)
#> 4. ├─dplyr:::summarise.grouped_df(...)
#> 5. │ └─dplyr:::summarise_cols(.data, dplyr_quosures(...), by, "summarise")
#> 6. │ ├─base::withCallingHandlers(...)
#> 7. │ └─dplyr:::map(quosures, summarise_eval_one, mask = mask)
#> 8. │ └─base::lapply(.x, .f, ...)
#> 9. │ └─dplyr (local) FUN(X[[i]], ...)
#> 10. │ ├─base::withCallingHandlers(...)
#> 11. │ └─mask$eval_all_summarise(quo)
#> 12. │ └─dplyr (local) eval()
#> 13. ├─global `<rlng_lm_>`(mpg)
#> 14. │ ├─stats::weighted.mean(., w = cyl)
#> 15. │ └─stats:::weighted.mean.default(., w = cyl)
#> 16. └─base::.handleSimpleError(...)
#> 17. └─dplyr (local) h(simpleError(msg, call))
#> 18. └─rlang::abort(msg, call = call("across"), parent = cnd)
Created on 2023-04-06 with reprex v2.0.2
@eipi10 It is still possible, but you need to defuse and inject the expression.
FUNS = quo(c(mean=mean, mean.wt=~weighted.mean(., w=cyl)))
fn(mtcars,
cols=c(mpg, hp),
fns=!!FUNS,
groups=c(am, vs))
Thanks @randy3k!
I found this after experiencing the identical issue- needing to use weighted.mean
with a data-variable for the weights in an across
, with weighted.mean
being one of many possible user-supplied functions. These functions are typically defined in a list by a user (or programatically), which is then passed to a function essentially the same as fn
above, essentially identically to @randy3k 's comment above.
While the solution works, it is causing headaches for users, who have to remember to wrap their list of functions in rlang::quo
sometimes and use !!
in the call. In addition, if a list of functions is generated programatically, getting that quo
wrapper is not straightforward.
I see @randy3k 's point about ambiguities, but I wonder if there's a way to explicitly remove them while avoiding the need to wrap the whole set of functions in quo
. A solution that allowed an explict data reference would remove the ambiguity, e.g.
FUNS <- list(mean = mean, mean.wt = ~weighted.mean(., w = .data$cyl)
. Is that possible?
I've tried to get that to work in a few different ways by using eval_tidy
to provide the .data
pronoun inside the summary(across))
with no success and maybe it just doesn't work- I get confused quickly trying to understand what is actually happening with the stack and what can be referenced by the time we're inside the summarize(across())
.
I used rlang::as_function()
. It seemed to work. But I am not too sure of the implications.
Just checking back here to see if there is now (or will eventually be) a better way to pass function arguments within across
. By "better," I mean better than having to remember to defuse the function(s) by wrapping in quo
and then later inject with !!
.
I have a summarizing function that's similar to the function below. It allows the user to pass grouping variables, summary variables and any number of summary functions as arguments.
I often use
weighted.mean
as a summary function in theFUNS
argument, which requires a weighting variable, which I pass with the bare column name, like this:This approach worked in
dplyr 1.0.10
and previous versions, but is failing indplyr 1.1.1
. Reproducible examples are below, first with 1.0.10 then with 1.1.1.How can I update my function so that it will work properly with
dplyr 1.1.1
? I've never been happy with hard-coding thew
argument anyway. Is there some tidyeval way that I should be passing thew
argument into the summary function?Example with
dplyr 1.0.10
Same example, but with
dplyr 1.1.1