I wrote this for Advanced R, but it doesn't feel quite right there. I think it might be better in the programming with dplyr vignette (whatever that ends up being)
Tangling with dots
In our grouped_mean() example above, we allow the user to select one grouping variable, and one summary variable. What if we wanted to allow the user to select more than one? One option would be to use .... There are three possible ways we could use ... it:
Pass ... onto the mean() function. That would make it easy to set
na.rm = TRUE. This is easiest to implement.
Allow the user to select multiple groups
Allow the user to select multiple variables to summarise.
Implementing each one of these is relatively straightforward, but what if we want to be able to group by multiple variables, summarise multiple variables, and pass extra args on to mean(). Generally, I think it is better to avoid this sort of API (instead relying on multiple function that each do one thing) but sometimes it is the lesser of the two evils, so it is useful to have a technique in your backpocket to handle it.
If you use this design a lot, you may also want to provide an alias to exprs() with a better name. For example, dplyr provides the vars() wrapper to support the scoped verbs (e.g. summarise_if(), mutate_at()). aes() in ggplot2 is similar, although it does a little more: requires all arguments be named, naming the the first arguments (x and y) by default, and automatically renames so you can use the base names for aesthetics (e.g. pch vs shape).
I wrote this for Advanced R, but it doesn't feel quite right there. I think it might be better in the programming with dplyr vignette (whatever that ends up being)
Tangling with dots
In our
grouped_mean()
example above, we allow the user to select one grouping variable, and one summary variable. What if we wanted to allow the user to select more than one? One option would be to use...
. There are three possible ways we could use...
it:Pass
...
onto themean()
function. That would make it easy to setna.rm = TRUE
. This is easiest to implement.Allow the user to select multiple groups
Allow the user to select multiple variables to summarise.
Implementing each one of these is relatively straightforward, but what if we want to be able to group by multiple variables, summarise multiple variables, and pass extra args on to
mean()
. Generally, I think it is better to avoid this sort of API (instead relying on multiple function that each do one thing) but sometimes it is the lesser of the two evils, so it is useful to have a technique in your backpocket to handle it.If you use this design a lot, you may also want to provide an alias to
exprs()
with a better name. For example, dplyr provides thevars()
wrapper to support the scoped verbs (e.g.summarise_if()
,mutate_at()
).aes()
in ggplot2 is similar, although it does a little more: requires all arguments be named, naming the the first arguments (x
andy
) by default, and automatically renames so you can use the base names for aesthetics (e.g.pch
vsshape
).Exercises
Implement the three variants of
grouped_mean()
described above: