tidyverse / readxl

Read excel files (.xls and .xlsx) into R 🖇
https://readxl.tidyverse.org
Other
729 stars 194 forks source link

zip path is too long #719

Closed brianmsm closed 2 weeks ago

brianmsm commented 1 year ago

I am in windows and I have a certain folder structure. I have a database that I try to import with readxl::read_excel(), however I get the following error:

Error in unz(zip_path, file_path, open = "rb") : 
  cannot open the connection
In addition: Warning message:
In unz(zip_path, file_path, open = "rb") : zip path is too long

I have copied the same file to the same location in .sav and .dta format with the haven package and it reads normally. I have also activated long paths as suggested here (https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=powershell), but it still does not work.

haven::read_sav("1. Data/Valence Depresion Domaradzka.sav")
#> # A tibble: 1,632 × 39
#>       Id sex         age VD02    VD03    VD04    VD05    VD06    VD07    VD08   
#>    <dbl> <dbl+lbl> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l>
#>  1     2 1 [Femal…    32 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  2     4 1 [Femal…    34 2 [I d… 1 [I a… 1 [I a… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  3    10 1 [Femal…    30 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  4    11 1 [Femal…    23 1 [I a… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  5    15 1 [Femal…    53 2 [I d… 1 [I a… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  6    16 1 [Femal…    46 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  7    17 1 [Femal…    51 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d… 2 [I d… 2 [I d…
#>  8    19 1 [Femal…    62 1 [I a… 1 [I a… 2 [I d… 1 [I a… 1 [I a… 2 [I d… 1 [I a…
#>  9    22 1 [Femal…    34 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#> 10    24 1 [Femal…    43 2 [I d… 1 [I a… 2 [I d… 1 [I a… 1 [I a… 1 [I a… 1 [I a…
#> # … with 1,622 more rows, and 29 more variables: VD09 <dbl+lbl>,
#> #   VD10 <dbl+lbl>, VD11 <dbl+lbl>, VD12 <dbl+lbl>, VD14 <dbl+lbl>,
#> #   VD15 <dbl+lbl>, VD16 <dbl+lbl>, VD17 <dbl+lbl>, VD18 <dbl+lbl>,
#> #   VD19 <dbl+lbl>, VD20 <dbl+lbl>, VD21 <dbl+lbl>, VD22 <dbl+lbl>,
#> #   VD23 <dbl+lbl>, VD24 <dbl+lbl>, VD25 <dbl+lbl>, VD26 <dbl+lbl>,
#> #   VD27 <dbl+lbl>, VD28 <dbl+lbl>, VD29 <dbl+lbl>, VD30 <dbl+lbl>,
#> #   VD31 <dbl+lbl>, VD33 <dbl+lbl>, VD34 <dbl+lbl>, VD35 <dbl+lbl>, …
haven::read_dta("1. Data/Valence depresion Domaradzka.dta")
#> # A tibble: 1,632 × 39
#>       Id sex         age VD02    VD03    VD04    VD05    VD06    VD07    VD08   
#>    <dbl> <dbl+lbl> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l>
#>  1     1 2 [Male]     31 1 [I a… 2 [I d… 2 [I d… 1 [I a… 1 [I a… 1 [I a… 1 [I a…
#>  2     2 1 [Femal…    32 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  3     3 2 [Male]     40 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  4     4 1 [Femal…    34 2 [I d… 1 [I a… 1 [I a… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  5     5 2 [Male]     40 2 [I d… 2 [I d… 1 [I a… 2 [I d… 1 [I a… 2 [I d… 2 [I d…
#>  6     6 2 [Male]     24 2 [I d… 1 [I a… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  7     7 2 [Male]     29 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  8     8 2 [Male]     25 1 [I a… 1 [I a… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 1 [I a…
#>  9     9 2 [Male]     25 1 [I a… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d… 1 [I a…
#> 10    10 1 [Femal…    30 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#> # … with 1,622 more rows, and 29 more variables: VD09 <dbl+lbl>,
#> #   VD10 <dbl+lbl>, VD11 <dbl+lbl>, VD12 <dbl+lbl>, VD14 <dbl+lbl>,
#> #   VD15 <dbl+lbl>, VD16 <dbl+lbl>, VD17 <dbl+lbl>, VD18 <dbl+lbl>,
#> #   VD19 <dbl+lbl>, VD20 <dbl+lbl>, VD21 <dbl+lbl>, VD22 <dbl+lbl>,
#> #   VD23 <dbl+lbl>, VD24 <dbl+lbl>, VD25 <dbl+lbl>, VD26 <dbl+lbl>,
#> #   VD27 <dbl+lbl>, VD28 <dbl+lbl>, VD29 <dbl+lbl>, VD30 <dbl+lbl>,
#> #   VD31 <dbl+lbl>, VD33 <dbl+lbl>, VD34 <dbl+lbl>, VD35 <dbl+lbl>, …
readxl::read_excel("1. Data/Valence depresion Domaradzka.xlsx")
#> Warning in unz(zip_path, file_path, open = "rb"): el path de zip es demasiado
#> largo
#> Error in unz(zip_path, file_path, open = "rb"): no se puede abrir la conexión

fs::path_real("1. Data/Valence depresion Domaradzka.xlsx")
#> D:/Insync/brianmsm@gmail.com/Google Drive/Cursos de Brian Peña - Compartido/Mios/Cursos en la SPP/1. Curso Virtual. Análisis de datos con R para Psicólogos/Materiales/Cuarta Edición/Sesión 01/1. Data/Valence depresion Domaradzka.xlsx

Created on 2023-02-05 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.2 (2022-10-31 ucrt) #> os Windows 10 x64 (build 22621) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Spanish_Peru.utf8 #> ctype Spanish_Peru.utf8 #> tz America/Bogota #> date 2023-02-05 #> pandoc 3.0.1 @ C:/Users/brian/AppData/Local/Pandoc/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.2) #> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.2) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.2) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.2) #> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.2) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.2) #> forcats 1.0.0 2023-01-29 [1] CRAN (R 4.2.2) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.2) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.2) #> haven 2.5.1 2022-08-22 [1] CRAN (R 4.2.2) #> hms 1.1.2 2022-08-19 [1] CRAN (R 4.2.2) #> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2) #> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.2) #> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.2) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.2) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.2) #> readr 2.1.3 2022-10-01 [1] CRAN (R 4.2.2) #> readxl 1.4.1 2022-08-17 [1] CRAN (R 4.2.2) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.2) #> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.2) #> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.2) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.2) #> styler 1.9.0 2023-01-15 [1] CRAN (R 4.2.2) #> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.2) #> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.2) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.2) #> vctrs 0.5.1 2022-11-16 [1] CRAN (R 4.2.2) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.2) #> xfun 0.36 2022-12-21 [1] CRAN (R 4.2.2) #> yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.2) #> #> [1] C:/Users/brian/AppData/Local/R/win-library/4.2 #> [2] C:/Program Files/R/R-4.2.2/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
jennybc commented 1 year ago

readxl only uses base R facilities in the internal helper where this is coming from:

https://github.com/tidyverse/readxl/blob/main/R/xlsx-zip.R

So the answer for now is that this path truly is problematic for readxl, because there's not some quick fix we can make in our code.

I know you say have activated long paths, but here's someone reporting success with that method, pointing to exactly the same article: https://stackoverflow.com/a/71621579 Have you definitely restarted your computer since making the change?

It looks like openxlsx uses a 3rd party library to access the files inside the .zip archive (which is what .xlsx files actually are), so you may want to try using that package instead.

jennybc commented 1 year ago

And another lead re: something to check on your system: https://community.rstudio.com/t/does-rstudio-use-windows-longpathsenabled-registry-setting/130033

jennybc commented 1 year ago

I have by no means digested all of the content in this post, but it gives me hope that perhaps the problem is going to be fixed at the source, i.e. in R itself, in the not-too-distant future:

https://blog.r-project.org/2023/03/07/path-length-limit-on-windows/

brianmsm commented 1 year ago

I'm sorry, I had not seen the responses in this thread. I made the change in gpedit.msc and restarted also but the problem persists.

jennybc commented 1 year ago

It is possible that the next version of R will handle long paths better and solve this for us.

brianmsm commented 2 weeks ago

Hello!

This is now working without problems!