tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

arrange_at doesn't work when select Chinese colnames via select helpers #3812

Closed shizidushu closed 6 years ago

shizidushu commented 6 years ago

It seems arranged_at doesn't work properly if Chinese colnames used in select helpers.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  df <- data.frame(qty = c(3,1,5,4), 金额 = c(1,3,4,2))
  arrange_at(df, vars(ends_with('金额')))
#>   qty 金额
#> 1   3    1
#> 2   4    2
#> 3   1    3
#> 4   5    4
  arrange_at(df, desc(vars(ends_with('金额'))))
#>   qty 金额
#> 1   3    1
#> 2   4    2
#> 3   1    3
#> 4   5    4
  sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 9 (stretch)
#> 
#> Matrix products: default
#> BLAS: /usr/lib/openblas-base/libblas.so.3
#> LAPACK: /usr/lib/libopenblasp-r0.2.19.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] bindrcpp_0.2.2 dplyr_0.7.6   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.18     crayon_1.3.4     assertthat_0.2.0 digest_0.6.16   
#>  [5] rprojroot_1.3-2  R6_2.2.2         backports_1.1.2  magrittr_1.5    
#>  [9] evaluate_0.11    pillar_1.3.0     rlang_0.2.2.9000 stringi_1.2.4   
#> [13] rmarkdown_1.10   tools_3.5.1      stringr_1.3.1    glue_1.3.0      
#> [17] purrr_0.2.5      xfun_0.3         yaml_2.2.0       compiler_3.5.1  
#> [21] pkgconfig_2.0.2  htmltools_0.3.6  tidyselect_0.2.4 bindr_0.1.1     
#> [25] knitr_1.20.15    tibble_1.4.2
batpigandme commented 6 years ago

Hi @shizidushu,

I don't think this has to do with the character encoding, as the same thing occurs if you do the same thing using vars(ends_with('ty'))

library(tidyverse)
df <- data.frame(qty = c(3,1,5,4), 金额 = c(1,3,4,2))

# normal arrange descending
arrange(df, desc(`金额`))
#>   qty 金额
#> 1   5    4
#> 2   1    3
#> 3   4    2
#> 4   3    1

# works
arrange_at(df, vars(ends_with('ty')))
#>   qty 金额
#> 1   1    3
#> 2   3    1
#> 3   4    2
#> 4   5    4

# works
arrange_at(df, vars(ends_with('金额')))
#>   qty 金额
#> 1   3    1
#> 2   4    2
#> 3   1    3
#> 4   5    4

# works
arrange_at(df, vars(ends_with('ty')))
#>   qty 金额
#> 1   1    3
#> 2   3    1
#> 3   4    2
#> 4   5    4

# doesn't work
arrange_at(df, desc(vars(ends_with('ty'))))
#>   qty 金额
#> 1   3    1
#> 2   4    2
#> 3   1    3
#> 4   5    4

Created on 2018-09-12 by the reprex package (v0.2.0.9000).

yutannihilation commented 6 years ago

desc() cannot be applied to vars(), which specifies the column indices, not expressions. If you want to transform the variables, the transformation functions should be specified on .funs argument.

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(qty = c(3,1,5,4), 金额 = c(1,3,4,2))

arrange_at(df, vars(ends_with('金额')))
#>   qty 金额
#> 1   3    1
#> 2   4    2
#> 3   1    3
#> 4   5    4

arrange_at(df, vars(ends_with('金额')), desc)
#>   qty 金额
#> 1   5    4
#> 2   1    3
#> 3   4    2
#> 4   3    1

Created on 2018-09-13 by the reprex package (v0.2.0.9000).

FYI, Examples section of arrange_all explains this:

# You can supply a function that will be applied before taking the
# ordering of the variables. The variables of the sorted tibble
# keep their original values.
arrange_all(df, desc)
arrange_all(df, funs(desc(.)))
shizidushu commented 6 years ago

@batpigandme @yutannihilation Thanks for your reply.

So .funs do the transformation and the transformed values are used to generate the order.

But now I'm confused about what .funs do when I specify two functions.

library(tidyverse)
df <- data.frame(qty1 = c(1, 1, 2), qty2 = c(-1, 0, -2), amt = c(10, 13, 11))
# arrange_at and their arrange equivalents?
arrange_at(df, vars(starts_with('qty')), funs(desc, identity)) # arrange(df, desc(qty1), qty2)
#>   qty1 qty2 amt
#> 1    2   -2  11
#> 2    1    0  13
#> 3    1   -1  10
arrange_at(df, vars(starts_with('qty')), funs(identity, identity)) # arrange(df, qty1, qty2)
#>   qty1 qty2 amt
#> 1    1   -1  10
#> 2    1    0  13
#> 3    2   -2  11
arrange_at(df, vars(starts_with('qty')), funs(identity, desc)) # arrange(df, qty1, desc(qty2))
#>   qty1 qty2 amt
#> 1    1   -1  10
#> 2    1    0  13
#> 3    2   -2  11

Created on 2018-09-13 by the reprex package (v0.2.0).

yutannihilation commented 6 years ago

IIUC, arrange_at(df, vars(starts_with('qty')), funs(desc, identity)) is equivalent to

arrange(df, desc(qty1), decs(qty2), identity(qty1), identify(qty2))

If you want to transform variables separately, I recommend you to use tidyeval...

shizidushu commented 6 years ago

@yutannihilation Thanks. I got it.

lock[bot] commented 5 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/