jennybc / googlesheets

Google Spreadsheets R API
Other
783 stars 191 forks source link

Non-unique column header behaviour #369

Closed jmcurran closed 2 years ago

jmcurran commented 6 years ago

As of 2022-02-03, the googlesheets package has been archived on CRAN. I'm closing all issues and pull requests prior to archiving this GitHub repo.

Learn more at:

https://github.com/jennybc/googlesheets#readme

jennybc commented 6 years ago

I just created a Sheet with duplicate column names and here's what I see:

> ss <- gs_url("https://docs.google.com/spreadsheets/d/1Q9JLd6F4yKOZqbm22-5v9JZzzCEPfBQWSAgwn5inJqo/edit?usp=sharing")
Sheet-identifying info appears to be a browser URL.
googlesheets will attempt to extract sheet key from the URL.
Putative key: 1Q9JLd6F4yKOZqbm22-5v9JZzzCEPfBQWSAgwn5inJqo
Worksheets feed constructed with public visibility

> gs_read(ss)
Accessing worksheet titled 'Sheet1'.
Downloading: 34 B     Warning: Duplicated column names deduplicated: 'a' => 'a_1' [2]
Parsed with column specification:
cols(
  a = col_double(),
  a_1 = col_double()
)
# A tibble: 1 x 2
      a   a_1
  <dbl> <dbl>
1     1     2

The scolding for duplicate column names is emanating from somewhere other than googlesheets, namely readr. What version of readr do you have? I have v1.2.0 which, despite appearances, is a development version, i.e. it's ahead of CRAN.

In any case, if you're in a pickle, you can pass arguments through to readr. In this example, I tell readr to ignore the first row and create its own column names:

> gs_read(ss, col_names = FALSE, skip = 1)
Accessing worksheet titled 'Sheet1'.
Downloading: 34 B     Parsed with column specification:
cols(
  X1 = col_double(),
  X2 = col_double()
)
# A tibble: 1 x 2
     X1    X2
  <dbl> <dbl>
1     1     2

That would get you unstuck. But I think you really should figure out why you get an error about this, whereas I just get a warning.

jmcurran commented 6 years ago

Hi Jenny,

many apologies for the long delay in responding. I did update readr to 1.2.0 but this did not help. A little sleuthing led me to tracking the error's occurrence. If I specify the range of cells to be read, then it triggers the error. With your example, this is:

ss %>% gs_read(ws = "Sheet1", range = "A1:B2")
Accessing worksheet titled 'Sheet1'.
Downloading: 4 kB     Error: Column `a` must have a unique name

I can fix it by specifying the column names of course:

ss %>% gs_read(ws = "Sheet1", range = "A1:B2", skip = 1, col_names = c('a_1','a_2'))

but that rather defeats the purpose.

jennybc commented 6 years ago

What version of dplyr do you have?

jennybc commented 6 years ago

I think a general devtools::session_info() would be interesting here.

jmcurran commented 6 years ago

0.7.6


> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2     googlesheets_0.3.0 magrittr_1.5       glue_1.3.0         readxl_1.1.0       forcats_0.3.0      stringr_1.3.1     
 [8] dplyr_0.7.6        purrr_0.2.5        readr_1.2.0        tidyr_0.8.1        tibble_1.4.2       ggplot2_3.0.0      tidyverse_1.2.1   
[15] here_0.1           RSQLite_2.1.1     

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.4 haven_1.1.2      lattice_0.20-35  colorspace_1.3-2 yaml_2.2.0       utf8_1.1.4       blob_1.1.1       rlang_0.2.1     
 [9] pillar_1.3.0     withr_2.1.2      DBI_1.0.0        bit64_0.9-7      modelr_0.1.2     bindr_0.1.1      plyr_1.8.4       munsell_0.5.0   
[17] gtable_0.2.0     cellranger_1.1.0 rvest_0.3.2      memoise_1.1.0    curl_3.2         fansi_0.2.3      broom_0.5.0      Rcpp_0.12.18    
[25] openssl_1.0.2    scales_0.5.0     backports_1.1.2  jsonlite_1.5     bit_1.1-14       hms_0.4.2        digest_0.6.15    stringi_1.2.4   
[33] grid_3.5.1       rprojroot_1.3-2  cli_1.0.0        tools_3.5.1      lazyeval_0.2.1   crayon_1.3.4     pkgconfig_2.0.1  xml2_1.2.0      
[41] lubridate_1.7.4  assertthat_0.2.0 rematch_1.0.1    httr_1.3.1       rstudioapi_0.7   R6_2.2.2         nlme_3.1-137     compiler_3.5.1 
jmcurran commented 6 years ago

Sure :)


> devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_NZ.UTF-8                 
 ctype    en_NZ.UTF-8                 
 tz       Pacific/Auckland            
 date     2018-10-24                  

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package      * version date       lib source                          
 assertthat     0.2.0   2017-04-11 [1] CRAN (R 3.5.0)                  
 backports      1.1.2   2017-12-13 [1] CRAN (R 3.5.0)                  
 bindr          0.1.1   2018-03-13 [1] CRAN (R 3.5.0)                  
 bindrcpp     * 0.2.2   2018-03-29 [1] CRAN (R 3.5.0)                  
 bit            1.1-14  2018-05-29 [1] CRAN (R 3.5.0)                  
 bit64          0.9-7   2017-05-08 [1] CRAN (R 3.5.0)                  
 blob           1.1.1   2018-03-25 [1] CRAN (R 3.5.0)                  
 broom          0.5.0   2018-07-17 [1] CRAN (R 3.5.0)                  
 callr          2.0.4   2018-05-15 [1] CRAN (R 3.5.0)                  
 cellranger     1.1.0   2016-07-27 [1] CRAN (R 3.5.0)                  
 cli            1.0.0   2017-11-05 [1] CRAN (R 3.5.0)                  
 colorspace     1.3-2   2016-12-14 [1] CRAN (R 3.5.0)                  
 crayon         1.3.4   2017-09-16 [1] CRAN (R 3.5.0)                  
 curl           3.2     2018-03-28 [1] CRAN (R 3.5.0)                  
 DBI            1.0.0   2018-05-02 [1] CRAN (R 3.5.0)                  
 desc           1.2.0   2018-05-01 [1] CRAN (R 3.5.0)                  
 devtools       2.0.0   2018-10-19 [1] CRAN (R 3.5.1)                  
 digest         0.6.15  2018-01-28 [1] CRAN (R 3.5.0)                  
 dplyr        * 0.7.6   2018-06-29 [1] CRAN (R 3.5.1)                  
 fansi          0.2.3   2018-05-06 [1] CRAN (R 3.5.0)                  
 forcats      * 0.3.0   2018-02-19 [1] CRAN (R 3.5.0)                  
 fs             1.2.6   2018-08-23 [1] CRAN (R 3.5.0)                  
 ggplot2      * 3.0.0   2018-07-03 [1] CRAN (R 3.5.0)                  
 glue         * 1.3.0   2018-07-17 [1] CRAN (R 3.5.0)                  
 googlesheets * 0.3.0   2018-06-29 [1] CRAN (R 3.5.0)                  
 gtable         0.2.0   2016-02-26 [1] CRAN (R 3.5.0)                  
 haven          1.1.2   2018-06-27 [1] CRAN (R 3.5.0)                  
 here         * 0.1     2017-05-28 [1] CRAN (R 3.5.0)                  
 hms            0.4.2   2018-03-10 [1] CRAN (R 3.5.0)                  
 httr           1.3.1   2017-08-20 [1] CRAN (R 3.5.0)                  
 jsonlite       1.5     2017-06-01 [1] CRAN (R 3.5.0)                  
 lattice        0.20-35 2017-03-25 [1] CRAN (R 3.5.1)                  
 lazyeval       0.2.1   2017-10-29 [1] CRAN (R 3.5.0)                  
 lubridate      1.7.4   2018-04-11 [1] CRAN (R 3.5.0)                  
 magrittr     * 1.5     2014-11-22 [1] CRAN (R 3.5.0)                  
 memoise        1.1.0   2017-04-21 [1] CRAN (R 3.5.0)                  
 modelr         0.1.2   2018-05-11 [1] CRAN (R 3.5.0)                  
 munsell        0.5.0   2018-06-12 [1] CRAN (R 3.5.0)                  
 nlme           3.1-137 2018-04-07 [1] CRAN (R 3.5.1)                  
 openssl        1.0.2   2018-07-30 [1] CRAN (R 3.5.0)                  
 pillar         1.3.0   2018-07-14 [1] CRAN (R 3.5.0)                  
 pkgbuild       1.0.2   2018-10-16 [1] CRAN (R 3.5.0)                  
 pkgconfig      2.0.1   2017-03-21 [1] CRAN (R 3.5.0)                  
 pkgload        1.0.1   2018-10-11 [1] CRAN (R 3.5.0)                  
 plyr           1.8.4   2016-06-08 [1] CRAN (R 3.5.0)                  
 prettyunits    1.0.2   2015-07-13 [1] CRAN (R 3.5.0)                  
 processx       3.1.0   2018-05-15 [1] CRAN (R 3.5.0)                  
 purrr        * 0.2.5   2018-05-29 [1] CRAN (R 3.5.0)                  
 R6             2.2.2   2017-06-17 [1] CRAN (R 3.5.0)                  
 Rcpp           0.12.18 2018-07-23 [1] CRAN (R 3.5.0)                  
 readr        * 1.2.0   2018-10-24 [1] Github (tidyverse/readr@69c9fd3)
 readxl       * 1.1.0   2018-04-20 [1] CRAN (R 3.5.0)                  
 rematch        1.0.1   2016-04-21 [1] CRAN (R 3.5.0)                  
 remotes        2.0.1   2018-10-19 [1] CRAN (R 3.5.0)                  
 rlang          0.2.1   2018-05-30 [1] CRAN (R 3.5.0)                  
 rprojroot      1.3-2   2018-01-03 [1] CRAN (R 3.5.0)                  
 RSQLite      * 2.1.1   2018-05-06 [1] CRAN (R 3.5.0)                  
 rstudioapi     0.7     2017-09-07 [1] CRAN (R 3.5.0)                  
 rvest          0.3.2   2016-06-17 [1] CRAN (R 3.5.0)                  
 scales         0.5.0   2017-08-24 [1] CRAN (R 3.5.0)                  
 sessioninfo    1.1.0   2018-09-25 [1] CRAN (R 3.5.0)                  
 stringi        1.2.4   2018-07-20 [1] CRAN (R 3.5.0)                  
 stringr      * 1.3.1   2018-05-10 [1] CRAN (R 3.5.0)                  
 testthat       2.0.0   2017-12-13 [1] CRAN (R 3.5.0)                  
 tibble       * 1.4.2   2018-01-22 [1] CRAN (R 3.5.0)                  
 tidyr        * 0.8.1   2018-05-18 [1] CRAN (R 3.5.0)                  
 tidyselect     0.2.4   2018-02-26 [1] CRAN (R 3.5.0)                  
 tidyverse    * 1.2.1   2017-11-14 [1] CRAN (R 3.5.0)                  
 usethis        1.4.0   2018-08-14 [1] CRAN (R 3.5.0)                  
 utf8           1.1.4   2018-05-24 [1] CRAN (R 3.5.0)                  
 withr          2.1.2   2018-03-15 [1] CRAN (R 3.5.0)                  
 xml2           1.2.0   2018-01-24 [1] CRAN (R 3.5.0)                  
 yaml           2.2.0   2018-07-25 [1] CRAN (R 3.5.0)                  

[1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library
jennybc commented 6 years ago

This message actually appears to be coming from dplyr.

https://github.com/tidyverse/dplyr/blob/76f1d112b3e4445bb5aca707c744f64876a15bac/src/utils.cpp#L36

It's not coming from googlesheets.

I note that you are on dplyr v0.7.6, whereas current CRAN version is v0.7.7, which is also what I have. And yet I don't know of any specific change that would affect what we are seeing. But updating to current CRAN version of dplyr can't hurt.

jennybc commented 6 years ago

You could also post the output of traceback() here.

jmcurran commented 6 years ago

No prob - this is from my actually application so probably around 10. upwards is the relevant part


 Error: Columns `Staff`, `Weight`, `Staff`, `Weight` must have unique names 
36.
stop(cnd) 
35.
abort(paste0(...)) 
34.
stopc(pluralise_msg("Column(s) ", vars), " ", pluralise(problem, 
    vars)) 
33.
invalid_df("must have [a] unique name(s)", x, dups) 
32.
check_tibble(x) 
31.
list_to_tibble(x, validate) 
30.
as_data_frame.data.frame(.) 
29.
dplyr::as_data_frame(.) 
28.
function_list[[k]](value) 
27.
withVisible(function_list[[k]](value)) 
26.
freduce(value, `_function_list`) 
25.
`_fseq`(`_lhs`) 
24.
eval(quote(`_fseq`(`_lhs`)), env, env) 
23.
eval(quote(`_fseq`(`_lhs`)), env, env) 
22.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
21.
dat %>% as.data.frame(stringsAsFactors = FALSE) %>% dplyr::as_data_frame() 
20.
gs_reshape_feed(x, ddd, verbose) 
19.
gs_reshape_cellfeed(., literal = literal, ..., verbose = verbose) 
18.
function_list[[k]](value) 
17.
withVisible(function_list[[k]](value)) 
16.
freduce(value, `_function_list`) 
15.
`_fseq`(`_lhs`) 
14.
eval(quote(`_fseq`(`_lhs`)), env, env) 
13.
eval(quote(`_fseq`(`_lhs`)), env, env) 
12.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
11.
gs_read_cellfeed(ss, ws = ws, range = range, ..., verbose = verbose) %>% 
    gs_reshape_cellfeed(literal = literal, ..., verbose = verbose) 
10.
gs_read(., ws = "Semester_1", range = "A1:J36", col_names = TRUE) 
9.
function_list[[k]](value) 
8.
withVisible(function_list[[k]](value)) 
7.
freduce(value, `_function_list`) 
6.
`_fseq`(`_lhs`) 
5.
eval(quote(`_fseq`(`_lhs`)), env, env) 
4.
eval(quote(`_fseq`(`_lhs`)), env, env) 
3.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
2.
mysheet %>% gs_read(ws = "Semester_1", range = "A1:J36", col_names = TRUE) 
1.
readData()