tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

Request: Add a .by argument to slice_ functions to match main verbs #6989

Closed orgadish closed 7 months ago

orgadish commented 7 months ago

The slice_X() verbs have a by argument while slice() and the other main verbs have the new .by argument, but as far as I can tell none of the documentation explain why this has to be this way.

I know it probably doesn't make sense to make a breaking change and replace the argument, but can a redundant .by be added instead to support new dplyr 1.1 use?

(This is one of my only pain points in the otherwise wonderful update that is the new dplyr 1.1!)

I think it would also be worth considering adding .by for the joins, but I've been using those for so long that I don't get those confused.

orgadish commented 7 months ago

I just saw the discussion in #6647, but I think that in that discussion @DavisVaughan was weighing more heavily the comparison between the arguments within slice_, as opposed to the comparison between different functions in a pipeline.

I think the latter is a mental hurdle that people (like me) will come across all the time, despite the many times I've seen the very helpful error.

In my opinion, in this case, it's more important to sacrifice consistency within slice_ and the consistency of why arguments are dot-prefixed, in favor of consistency across the functions. However, I understand, of course, that this is a subjective design decision.

orgadish commented 7 months ago

PS I was inspired to submit this request after watching @mine-cetinkaya-rundel's talk at Monash from mid 2023 in which she talked about this as a major pain point she hopes will be addressed, and I didn't see any features/discussion here about changing it.

DavisVaughan commented 7 months ago

We are satisfied with our current by vs .by naming decisions, and we think that our error message in the slice_*() family is pretty good when you do make the typo (autocompletion in RStudio does make this somewhat rare)

library(dplyr, warn.conflicts = FALSE)

mtcars %>%
  slice_head(n = 2, .by = cyl)
#> Error in `slice_head()`:
#> ! Can't specify an argument named `.by` in this verb.
#> ℹ Did you mean to use `by` instead?

The docs do note that it is "a technical difference" https://dplyr.tidyverse.org/reference/dplyr_by.html#supported-verbs. We don't go into detail because that isn't the point of those docs, but as you saw, I talked about the reason here https://github.com/tidyverse/dplyr/issues/6647#issuecomment-1397616165.

I also still think mtcars %>% slice_head(n = 2, .by = cyl) looks quite weird with n vs .by in the same function.