tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.79k stars 2.12k forks source link

Should `desc()` use `vec_rank()` internally on character vectors? #7045

Open DavisVaughan opened 5 months ago

DavisVaughan commented 5 months ago

See https://github.com/tidyverse/dplyr/issues/7044

In particular, note that arrange(df, x) will sort x using the C locale if it is a character vector. But arrange(df, -desc(x)) (i.e. invert the desc() call, giving you the original order in theory) will sort x using the user's locale.

Normally a call like desc(x) is recognized and we don't even actually call desc() under the hood, we translate it to a "desc" value for the directions argument of vec_order_radix(), but in this case the - interferes and we actually evaluate the call.

That ends up calling desc() which does -xtfrm(x), and xtfrm() ends up using base::order(x), utilizing the user's locale.

I don't think we should remove usage of xtfrm() in desc(), since that is a generic that people have probably written S3 methods for, but maybe we can have special behavior for unclassed character vectors where it utilized vec_rank() instead (which uses the C locale)? It would not be a perfect fix, but it may be good enough.