tidyverse / googlesheets4

Google Spreadsheets R API (reboot of the googlesheets package)
https://googlesheets4.tidyverse.org
Other
360 stars 53 forks source link

Sheets un-even columns range_speedread incorrect #309

Closed HugoGit39 closed 10 months ago

HugoGit39 commented 10 months ago

Hi

I have a large Google Sheets with uneven column lengths.

When I userange_speedread it doesnt read the last columns correct. Why?

See this example:

Google sheets:

https://docs.google.com/spreadsheets/d/1t2JYOCsCvK05Layi3loXFWwCBGXjXNUDMUAi9xudiHg/

test <- range_speedread(as_id("1t2JYOCsCvK05Layi3loXFWwCBGXjXNUDMUAi9xudiHg"), show_col_types = F)
jennybc commented 10 months ago

I'm not entirely sure what you mean by "doesnt read the last columns correct".

But I think you're just noticing trickiness of column type guessing in the presence of lots of missing data?

The docs for range_speedread() outline various gotchas of this function and point out that, ultimately, readr::read_csv()) gets used.

You can read about readr's column type guessing here:

https://readr.tidyverse.org/articles/column-types.html

But one solution for this dataset is just to instruct readr to use all the rows to guess column type, instead of the first 1000.

library(googlesheets4)
gs4_deauth()

test2 <- range_speedread(
  "1t2JYOCsCvK05Layi3loXFWwCBGXjXNUDMUAi9xudiHg",
  guess_max = Inf
)
#> ✔ Reading from "Test".
#> ℹ Export URL:
#>   <https://docs.google.com/spreadsheets/d/1t2JYOCsCvK05Layi3loXFWwCBGXjXNUDMUAi9xudiHg/export?format=csv>
#> Rows: 4741 Columns: 7
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (7): X0, X1, X2, X3, X4, X5, X6
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
test2
#> # A tibble: 4,741 × 7
#>       X0    X1    X2    X3    X4    X5    X6
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 0.3      NA    NA    NA    NA    NA    NA
#>  2 0.300    NA    NA    NA    NA    NA    NA
#>  3 0.295    NA    NA    NA    NA    NA    NA
#>  4 0.299    NA    NA    NA    NA    NA    NA
#>  5 0.299    NA    NA    NA    NA    NA    NA
#>  6 0.298    NA    NA    NA    NA    NA    NA
#>  7 0.32     NA    NA    NA    NA    NA    NA
#>  8 0.323    NA    NA    NA    NA    NA    NA
#>  9 0.323    NA    NA    NA    NA    NA    NA
#> 10 0.327    NA    NA    NA    NA    NA    NA
#> # ℹ 4,731 more rows
tail(test2)
#> # A tibble: 6 × 7
#>       X0    X1    X2    X3    X4      X5    X6
#>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
#> 1 42272.  NA    NA    NA   NA         NA  NA  
#> 2 43605.  NA    NA    NA   NA         NA  NA  
#> 3 43870.  NA    NA    NA   NA         NA  NA  
#> 4 44010.  10.2  10.8   7.3  7.54 9193097  42.9
#> 5 43769.  NA    NA    NA   NA         NA  NA  
#> 6 43098.  NA    NA    NA   NA         NA  NA

Created on 2024-01-15 with reprex v2.1.0.9000

HugoGit39 commented 9 months ago

Thank you!