tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

arrange_if, arrange_at, arrange_all? #2682

Closed yutannihilation closed 7 years ago

yutannihilation commented 7 years ago

(This issue is just out of curiosity. I don't find these useful yet)

Together with filter(), mutate(), select(), and summarise(), arrange() is referred to as

provide the basis of a language of data manipulation.

in the vignette, "Introduction to dplyr"

While all of other basic verbs (plus group_by()) got their colwise forms, why are there no scoped arrange()? Are there any good reason to avoid implementing arrange() variants? Or, is it just because no one has requested the feature so far?

arrange() has a bit different semantics in that the order of arguments affects the result and it is difficult to deduce which columns are used in what order from the result. So, I feel the idea of colwise may not be suitable for this verb. But, I still wonder it should exist or not just for consistency.

hadley commented 7 years ago

I assume this just slipped our minds. @lionel- can you please add?

lionel- commented 7 years ago

would this UI be reasonable?

arrange_all(mtcars, .desc = vars(mpg, cyl))
arrange_at(mtcars, vars(drat, hp), .desc = vars(mpg, cyl))
arrange_if(iris, is_factor, .desc = is_double)

hmm maybe this one is better:

arrange_at(mtcars, desc(drat, hp), vars(mpg, cyl))
arrange_if(iris, is_factor, .desc = TRUE)
arrange_all(mtcars, .desc = TRUE)

arrange_at() would take dots and the order of arguments would determine the arrangement. arrange_if() and arrange_all() would be descending or not, so you'd have to add more scoped arrange in the pipeline if you need something more complicated.

lionel- commented 7 years ago

Or:

arrange_if(iris, desc(is_factor))
arrange_if(iris, funs(desc(is_factor), is_numeric))
arrange_all(desc(mtcars))

I like them separately but not sure if they are consistent taken together. Also the funs() syntax and semantics are not clear regarding expressions, e.g. funs(desc(is_factor(.)) vs funs(desc(is_factor)(.)). And it'd rely on parsing rather than evaluation.

hadley commented 7 years ago

I think I like the first option the best, but it's not great. Lets implement without desc() support for now.

yutannihilation commented 7 years ago

Lets implement without desc() support for now.

IMHO, this is enough, at least for now. As we cannot easily control the order of variable via scoped variants, the support for the verbs like arrange() and group_by(), on which the order matters, cannot be complete anyway. It seems better to encourage people to use tidyeval for any complex usages and keep the scoped variants simple.

yutannihilation commented 7 years ago

Thanks!