tidyverse / googlesheets4

Google Spreadsheets R API (reboot of the googlesheets package)
https://googlesheets4.tidyverse.org
Other
360 stars 53 forks source link

read_sheet() converts column into list entries #240

Closed agilebean closed 3 years ago

agilebean commented 3 years ago

In my google sheet, one column has single quoted entries for numbers 01 to 09, and unquoted entries for entries above 10. This mixed datatype is converted as list entries, i.e. each entry is a list, which makes it difficult to convert back because the conversion depends on the datatype buried in the first element of a list, so I can't use unnest() The previous version google_sheets::gs_read() imported the same column correctly as character column.

Question: Is there any way to avoid the conversion into a list per entry?

Constraints:

  1. I know that using only integers in the form would prevent this, but the numbers in the form were displayed as 01 to 09 instead of 1 to 9.
  2. I don't want to specify the col_types because there are over 60 columns which are imported correctly, so only this ID column has the list entry problem.
data.presurvey[[1]]
[[1]]
[1] "01"

[[2]]
[1] "02"

[[10]]
[1] 12

[[11]]
[1] 11
jennybc commented 3 years ago

This is happening because the 01, 02, etc. cells are character and the 10, 11, etc. cells are numeric.

You've got various options:

One day, there might be a way for a user to express "I never want list-columns" (#51), but that is not this day.

agilebean commented 3 years ago

Thanks a lot for the feedback @jennybc !

And special thanks for the expression "that is not this day" for diplomatic linguistic expression that it cannot be implemented yet.

However, I would be grateful if you could give feedback to these two approaches:

  • Specify a col_type for this column (character, in this case).

Is there a way (maybe regex) to *only specify the first column (as I have 60+ columns which are all imported correctly)?

  • Convert the numbers to character and simplify the column on the R side.

I do this now by importing the id column *separately. Can you give a dplyr expression that could convert the list column to a character vector? I failed with unlist and map_chr

jennybc commented 3 years ago
library(tidyverse)

(dat <- tibble(x = list("01", "02", 10, 11)))
#> # A tibble: 4 × 1
#>   x        
#>   <list>   
#> 1 <chr [1]>
#> 2 <chr [1]>
#> 3 <dbl [1]>
#> 4 <dbl [1]>

dat %>% 
  mutate(x = map_chr(x, as.character))
#> # A tibble: 4 × 1
#>   x    
#>   <chr>
#> 1 01   
#> 2 02   
#> 3 10   
#> 4 11

Created on 2021-08-30 by the reprex package (v2.0.1.9000)

agilebean commented 3 years ago

Thanks so much @jennybc deeply grateful!!!

agilebean commented 1 year ago

@jennybc: May it be possible to reconsider the default conversion behavior of read_sheet()?

Reason: I just used read_sheet() again for some research data and realized that in any case when read_sheet() creates a list column, a user of read_sheet() would still be able to work with its content more easily if it was converted to a char vector.

Please see a comparison of the post processing of read_sheet() when

A) read_sheet() has imported mixed columns (char/num and NAs) as list columns if they were numeric columns

mutate(across(
      where(is.list),
      # list conversion to char leaves "NULL" entries > TRICKY: NULLs convert to NA
      ~ as.character(.x) %>% 
              na_if("NULL") %>% 
              as.numeric))

B) read_sheet() has imported mixed columns (char/num and NAs) as char:

mutate(across(
      num_var_labels,
      ~ as.numeric))

Solution B implies the conversion na_if("NULL").

I'm suggesting Solution B because it's intuitive for anybody to understand the conversion to numeric, but Solution A is almost impossible to come up with for tidyverse beginners. The option to specify the data type of columns is nice but impractible for datasets with 100+ columns which is often the case in social science datasets.