ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.11k stars 79 forks source link

Skim may fail when column names are similar except for case #615

Closed jw5 closed 3 years ago

jw5 commented 3 years ago

Reproducible example:

require(skimr)
data <- data.frame(x = 1, X = 2)
print(data)
summary(data)
skim(data)

Results:

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(skimr)
data <- data.frame(x = 1, X = 2)
print(data)
summary(data)
skim(data)
Loading required package: skimr
  x X
1 1 2
       x           X    
 Min.   :1   Min.   :2  
 1st Qu.:1   1st Qu.:2  
 Median :1   Median :2  
 Mean   :1   Mean   :2  
 3rd Qu.:1   3rd Qu.:2  
 Max.   :1   Max.   :2  
Error: Problem with `summarise()` input `skimmed`.
✖ Names must be unique.
✖ These names are duplicated:
  * "n_missing" at locations 2 and 3.
  * "complete_rate" at locations 4 and 5.
  * "numeric.mean" at locations 6 and 7.
  * "numeric.sd" at locations 8 and 9.
  * "numeric.p0" at locations 10 and 11.
  * ...
ℹ Use argument `names_repair` to specify repair strategy.
ℹ Input `skimmed` is `purrr::map2(...)`.
ℹ The error occurred in group 1: skim_type = "numeric".
Run `rlang::last_error()` to see where the error occurred.
> R.version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          0.2                         
year           2020                        
month          06                          
day            22                          
svn rev        78730                       
language       R                           
version.string R version 4.0.2 (2020-06-22)
nickname       Taking Off Again            
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux rodete

Matrix products: default
BLAS:   /opt/R/4.0.2/lib/R/lib/libRblas.so
LAPACK: /opt/R/4.0.2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] skimr_2.1.2

loaded via a namespace (and not attached):
 [1] tidyr_1.1.1      fansi_0.4.1      assertthat_0.2.1 digest_0.6.25   
 [5] crayon_1.3.4     dplyr_1.0.1      repr_1.1.0       R6_2.4.1        
 [9] jsonlite_1.7.0   lifecycle_0.2.0  magrittr_1.5     pillar_1.4.6    
[13] cli_2.0.2        rlang_0.4.7      vctrs_0.3.2      generics_0.0.2  
[17] ellipsis_0.3.1   tools_4.0.2      glue_1.4.1       purrr_0.3.4     
[21] xfun_0.16        compiler_4.0.2   base64enc_0.1-3  pkgconfig_2.0.3 
[25] htmltools_0.5.0  knitr_1.29       tidyselect_1.1.0 tibble_3.0.3    
elinw commented 3 years ago

I can't believe this hasn't come up before now.