tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

Consistently label arguments that use tidy evaluation/dots #4683

Closed hadley closed 4 years ago

hadley commented 4 years ago

I think this should be some sort of prefix or suffix in the argument list, which is linked to a common source of more details. Need to decide:

We should make it easy to make these changes after the fact by providing a macro, so params would look something like:

#' @param x <\tidy_eval{}> 
#' @param ... <\tidy_dots{}>

cc @lionel-, @gaborcsardi

gaborcsardi commented 4 years ago

Should this be part of the general type annotation? Did we decide on < against [ for that?

gaborcsardi commented 4 years ago

I think the prefix in the man page can be [tidy eval], "uses" seems redundant there.

For the roxygen / Rd notation, we could just use the \ Rd-like macros, maybe, but roxygen would interpret them? This would avoid introducing more special characters that we need to escape, etc. So it could be just \tidyeval? (A \ macro could work for the generic type annotation as well, actually.)

hadley commented 4 years ago

I was thinking if you had multiple annotations, you'd want them to all be inside the []/<>, so the []/<> would need to be embedded in the roxygen — but then you run into conflicts with md link syntax. So maybe we should just start with them embedded, e.g.

\newcommand{\tidyeval}{\[\link[=tidyeval]{tidy eval}\]}

(IIRC we can put it in man/macros, and it works in 3.2 and later.)

I'm slightly worried that just [tidy eval] will be too cryptic; but maybe it's not a big deal since it'll link to a page with more details, and most readers of documentation are already used to skipping over stuff that they don't understand.

hadley commented 4 years ago

And I think a documentation page is probably the right option — it would provide a brief overview and then point to the vignette (or possibly inline the content of the vignette by using @includeRmd)

hadley commented 4 years ago

Nope, doesn't work in 3.2 😢

gaborcsardi commented 4 years ago

OK. The support in our tooling is also not great, although I fixed some issues in pkgload/devtools and spelling AFAIR.

Anyway, roxygen2 is de facto standard now, so we can just do it there.

We could process the \ macro there, or we could also have a special link for this: [type::tidyeval]?

We could use this for type annotation in general, e.g. [type::list(),tidyeval], etc. Or just parse [tidyeval] (and similar type annotations) differently if it is the first word of a @param?

hadley commented 4 years ago

OTOH R 4.0.0 is scheduled for release on Feb 29, and I doubt we will release dplyr before then, so we could drop R 3.2 support a little early.

I filed a couple of issue to improve macro support in pkgload: https://github.com/r-lib/pkgload/issues/120, https://github.com/r-lib/pkgload/issues/119.

We definitely will want to add better support for this in roxygen2 in the long run, but I'd like to prototype the ideas here before we commit to a syntax.

gaborcsardi commented 4 years ago

IDK, it does not feel right to drop 3.2 support just for this.

Maybe roxygen could just put a newcommand{} into the file, for now, until we can rely on the macro file?

lionel- commented 4 years ago

So perhaps (without macros):

#' @param x <[data_masked][rlang::data-masked]> <[dynamic][rlang::nse-force]> Argument.
#' @param ... <[dynamic][rlang::dyn-dots]> Dots.

Maybe we could use hyphens instead of undercores in these buttons. Or do we need them to be also R code in some situations?

hadley commented 4 years ago

I think we've marketed tidy eval so much that it's going to cause less confusion than "data masking" even if it's a little imprecise.

Wouldn't we group all the modifiers inside of the <>?, e.g.:

#' @param x <[data_masked][rlang::data-masked], [dynamic][rlang::nse-force]> Argument.

I like the idea of using - to separate fragments; I think that makes it look a bit more "technical". And maybe we should put in code font too? So:

#' @param n <[`tidy-eval`][dplyr_tidy_eval]>.  Number of rows to return for `top_n()`, fraction 
#'     of rows to return for `top_frac()`.

That looks like this:

Screenshot 2020-01-02 at 7 23 44 AM
lionel- commented 4 years ago

I like comma or semi-colon separation, but then we're drifting away from the type tag model. How do we include type and size specifications? I think type needs to be in abbreviated form if possible, because these technical requirements should take as little room as possible.

[tidy-eval; type:lgl; size:nrow]

<tidy-eval, type:lgl, size:nrow>

<tidy-eval> <type:lgl> <size:nrow>

[tidy-eval] [type:lgl] [size:nrow]

I think I prefer the last one because the links look like buttons, are easily parsed by the eye, and look less stuffy.

hadley commented 4 years ago

For the type specification stuff, I was thinking we'd use the constructor syntax, e.g. <logical(nrow); tidy-eval>. But we don't need to solve that now, since for now we'll just focus on the evaluation semantics in dplyr.

Might also be worth thinking about what they'd link to — I think you'd link logical(nrow) to a page that explains the basic type/size syntax.