rOpenSpain / spanishoddata

Access national high-quality and open-access datasets on movement patterns derived from mobile telephone datasets
https://ropenspain.github.io/spanishoddata/
Other
20 stars 0 forks source link

Unhelpful error message with minimal `spod_get()` example #70

Closed Robinlovelace closed 1 week ago

Robinlovelace commented 1 week ago

Reproducible example:

remotes::install_github("robinlovelace/spanishoddata")
#> Using GitHub PAT from the git credential store.
#> Skipping install of 'spanishoddata' from a github remote, the SHA1 (3f9beb3f) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(spanishoddata)
Sys.setenv(SPANISH_OD_DATA_DIR = "data")
dir.create("data", showWarnings = FALSE)
od_data <- spod_get(type = "od", dates = "2023-01-02")
#> Saving the file to: C:/Users/georl_admin/AppData/Local/Temp/RtmpOOxyiM/reprex-b4b41386673f-satin-urial/data/data_links_v2_2024-09-05.xml
#> Data version detected from dates: 2
#> Using existing data links xml: C:/Users/georl_admin/AppData/Local/Temp/RtmpOOxyiM/reprex-b4b41386673f-satin-urial/data/data_links_v2_2024-09-05.xml
#> Downloading approximately 0.16 GB of data.
#> Retrieved data for requested dates.
#> Using existing data links xml: C:/Users/georl_admin/AppData/Local/Temp/RtmpOOxyiM/reprex-b4b41386673f-satin-urial/data/data_links_v2_2024-09-05.xml
#> Error in if (nchar(dsn) < 1) stop("`dsn` must point to a source, not an empty string.", : argument is of length zero

Created on 2024-09-05 with reprex v2.1.0

e-kotov commented 1 week ago

I will check the underlying code for processing the path. Probably just have to assume that "data" in this case is in the current working dir.

Robinlovelace commented 1 week ago

So a relative/absolute path issue? Will try with an absolute path.

Robinlovelace commented 1 week ago

Still fails with absolute path for me:

remotes::install_github("robinlovelace/spanishoddata")
#> Using GitHub PAT from the git credential store.
#> Skipping install of 'spanishoddata' from a github remote, the SHA1 (3f9beb3f) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(spanishoddata)
dir <- "C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data"
dir.exists(dir)
#> [1] TRUE
Sys.setenv(SPANISH_OD_DATA_DIR = dir)
od_data <- spod_get(type = "od", dates = "2023-01-02")
#> Data version detected from dates: 2
#> Using existing data links xml: C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data/data_links_v2_2024-09-05.xml
#> Using existing data links xml: C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data/data_links_v2_2024-09-05.xml
#> Error in if (nchar(dsn) < 1) stop("`dsn` must point to a source, not an empty string.", : argument is of length zero

Created on 2024-09-05 with reprex v2.1.0

e-kotov commented 1 week ago

Ok, that seems like an sf::read_sf issue (judging by the name of the problematic argument dsn) where it sometimes doesn't like the paths to gpkg/shp files. I will have a closer look soon. Even though you are requesting flows, the zones are also downloaded in the background to get the unique zone ids and safely encode the data with these verified zone ids.

Robinlovelace commented 1 week ago

Will test on Linux. Can try to debug here...

e-kotov commented 1 week ago

image Getting this error on another Linux system (submitted by a colleague)

packageVersion("sf")
‘1.0.16’

So it is the latest version as of now.

So far, I suspect that shp files from a zip archive in v1 data are not extracted to an expected path and therefore read_sf cannot read data. Or something like that. Will try to add safety checks to see if files exists before reading with sf.

Robinlovelace commented 1 week ago

Muy interesante... Seguro se lo podemos arreglar, planeo intentar de nuevo en 'debug mode' la semana que viene.

e-kotov commented 1 week ago

testing in rocker geo 4.1 with:

options(repos = c(CRAN = "https://packagemanager.posit.co/cran/__linux__/focal/latest"))

remotes::install_github("robinlovelace/spanishoddata")
e-kotov commented 1 week ago

I cannot reproduce this. Tried to downgrade curl below v 5, but then it just fails because there is no multi_download, so I'm guessing you do have curl v > 5 and the downloads of data are successful. I checked the code, and we seem to use absolute paths internally for everything. So the point of failure is probably unpacking of the zip for v1 data.

@Robinlovelace could paste the full recursive file list from "C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data/raw_data_cache" ?

This way we may be able to track at which point and why this fails.

Robinlovelace commented 1 week ago

Failed when I tried it:

> library(spanishoddata)
> dir <- "C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data"
> dir.exists(dir)
[1] TRUE
> Sys.setenv(SPANISH_OD_DATA_DIR = dir)
> od_data <- spod_get(type = "od", dates = "2023-01-02")
Saving the file to: C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data/data_links_v2_2024-09-06.xml
Download status: 1 done; 0 in progress. Total size: 4.41 Mb (100%)... done!
Data version detected from dates: 2
Using existing data links xml: C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data/data_links_v2_2024-09-06.xml
Using existing data links xml: C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data/data_links_v2_2024-09-06.xml
Error in if (nchar(dsn) < 1) stop("`dsn` must point to a source, not an empty string.",  : 
  argument is of length zero
> devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14 ucrt)
 os       Windows 11 x64 (build 22631)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United Kingdom.utf8
 ctype    English_United Kingdom.utf8
 tz       Europe/London
 date     2024-09-06
 pandoc   3.2.1 @ C:\\PROGRA~1\\Pandoc\\pandoc.exe

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  
 package       * version date (UTC) lib source
 bit             4.0.5   2022-11-15 [1] CRAN (R 4.4.1)
 bit64           4.0.5   2020-08-30 [1] CRAN (R 4.4.1)
 cachem          1.1.0   2024-05-16 [1] CRAN (R 4.4.1)
 class           7.3-22  2023-05-03 [2] CRAN (R 4.4.1)
 classInt        0.4-10  2023-09-05 [1] CRAN (R 4.4.1)
 cli             3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
 crayon          1.5.3   2024-06-20 [1] CRAN (R 4.4.1)
 curl            5.2.1   2024-03-01 [1] CRAN (R 4.4.1)
 DBI             1.2.3   2024-06-02 [1] CRAN (R 4.4.1)
 dbplyr          2.5.0   2024-03-19 [1] CRAN (R 4.4.1)
 devtools        2.4.5   2022-10-11 [1] CRAN (R 4.4.1)
 digest          0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
 dplyr           1.1.4   2023-11-17 [1] CRAN (R 4.4.1)
 duckdb          1.0.0-2 2024-07-19 [1] CRAN (R 4.4.1)
 e1071           1.7-14  2023-12-06 [1] CRAN (R 4.4.1)
 ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.4.1)
 fansi           1.0.6   2023-12-08 [1] CRAN (R 4.4.1)
 fastmap         1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
 fs              1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
 generics        0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
 glue            1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
 hms             1.1.3   2023-03-21 [1] CRAN (R 4.4.1)
 htmltools       0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
 htmlwidgets     1.6.4   2023-12-06 [1] CRAN (R 4.4.1)
 httpuv          1.6.15  2024-03-26 [1] CRAN (R 4.4.1)
 jsonlite        1.8.8   2023-12-04 [1] CRAN (R 4.4.1)
 KernSmooth      2.23-24 2024-05-17 [2] CRAN (R 4.4.1)
 later           1.3.2   2023-12-06 [1] CRAN (R 4.4.1)
 lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
 lubridate       1.9.3   2023-09-27 [1] CRAN (R 4.4.1)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
 memoise         2.0.1   2021-11-26 [1] CRAN (R 4.4.1)
 memuse          4.2-3   2023-01-24 [1] CRAN (R 4.4.0)
 mime            0.12    2021-09-28 [1] CRAN (R 4.4.0)
 miniUI          0.1.1.1 2018-05-18 [1] CRAN (R 4.4.1)
 parallelly      1.38.0  2024-07-27 [1] CRAN (R 4.4.1)
 pillar          1.9.0   2023-03-22 [1] CRAN (R 4.4.1)
 pkgbuild        1.4.4   2024-03-17 [1] CRAN (R 4.4.1)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
 pkgload         1.4.0   2024-06-28 [1] CRAN (R 4.4.1)
 profvis         0.3.8   2023-05-02 [1] CRAN (R 4.4.1)
 promises        1.3.0   2024-04-05 [1] CRAN (R 4.4.1)
 proxy           0.4-27  2022-06-09 [1] CRAN (R 4.4.1)
 purrr           1.0.2   2023-08-10 [1] CRAN (R 4.4.1)
 R6              2.5.1   2021-08-19 [1] CRAN (R 4.4.1)
 Rcpp            1.0.13  2024-07-17 [1] CRAN (R 4.4.1)
 readr           2.1.5   2024-01-10 [1] CRAN (R 4.4.1)
 remotes         2.5.0   2024-03-17 [1] CRAN (R 4.4.1)
 rlang           1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
 sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
 sf              1.0-16  2024-03-24 [1] CRAN (R 4.4.1)
 shiny           1.8.1.1 2024-04-02 [1] CRAN (R 4.4.1)
 spanishoddata * 0.0.1   2024-09-06 [1] Github (rOpenSpain/spanishoddata@bf031ef)
e-kotov commented 1 week ago

@Robinlovelace Ok, I inserted some debug prints. Let's track this down.

Whenever you have time, please reinstall from

remotes::install_github(
"rOpenSpain/spanishoddata@70-unhelpful-error-message-with-minimal-spod_get-example-ek",
  force = TRUE)

and repeat what you did.

Robinlovelace commented 1 week ago
remotes::install_github(
"rOpenSpain/spanishoddata@70-unhelpful-error-message-with-minimal-spod_get-example-ek",
  force = TRUE)
#> Using GitHub PAT from the git credential store.
#> Downloading GitHub repo rOpenSpain/spanishoddata@70-unhelpful-error-message-with-minimal-spod_get-example-ek
#> wk   (0.9.2 -> 0.9.3) [CRAN]
#> curl (5.2.1 -> 5.2.2) [CRAN]
#> Installing 2 packages: wk, curl
#> Installing packages into 'C:/Users/georl_admin/AppData/Local/R/win-library/4.4'
#> (as 'lib' is unspecified)
#> 
#>   There is a binary version available but the source version is later:
#>    binary source needs_compilation
#> wk  0.9.2  0.9.3              TRUE
#> 
#>   Binaries will be installed
#> package 'wk' successfully unpacked and MD5 sums checked
#> package 'curl' successfully unpacked and MD5 sums checked
#> Warning: cannot remove prior installation of package 'curl'
#> Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
#> C:\Users\georl_admin\AppData\Local\R\win-library\4.4\00LOCK\curl\libs\x64\curl.dll
#> to C:\Users\georl_admin\AppData\Local\R\win-library\4.4\curl\libs\x64\curl.dll:
#> Permission denied
#> Warning: restored 'curl'
#> 
#> The downloaded binary packages are in
#>  C:\Users\georl_admin\AppData\Local\Temp\RtmpqWE2DV\downloaded_packages
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#> * checking for file 'C:\Users\georl_admin\AppData\Local\Temp\RtmpqWE2DV\remotesbb286e3e10c7\rOpenSpain-spanishoddata-31cc7fe/DESCRIPTION' ... OK
#> * preparing 'spanishoddata':
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building 'spanishoddata_0.0.1.tar.gz'
#> 
#> Installing package into 'C:/Users/georl_admin/AppData/Local/R/win-library/4.4'
#> (as 'lib' is unspecified)
library(spanishoddata)
dir <- "C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data"
dir.exists(dir)
#> [1] TRUE
Sys.setenv(SPANISH_OD_DATA_DIR = dir)
od_data <- spod_get(type = "od", dates = "2023-01-02")
#> Data version detected from dates: 2
#> Using existing data links xml: C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data/data_links_v2_2024-09-06.xml
#> Using existing data links xml: C:/Users/georl_admin/github/robinlovelace/spanishoddata_paper/data/data_links_v2_2024-09-06.xml
#> character(0)
#> [1] "file exists?"
#> named logical(0)
#> Error in if (nchar(dsn) < 1) stop("`dsn` must point to a source, not an empty string.", : argument is of length zero
list.files(dir, recursive = TRUE)
#> [1] "data_links_v2_2024-09-05.xml"                                                                                            
#> [2] "data_links_v2_2024-09-06.xml"                                                                                            
#> [3] "raw_data_cache/v2/estudios_basicos/por-distritos/viajes/ficheros-diarios/year=2023/month=1/day=2/Viajes_distritos.csv.gz"
#> [4] "raw_data_cache/v2/estudios_basicos/por-distritos/viajes/ficheros-diarios/year=2024/month=1/day=1/Viajes_distritos.csv.gz"
#> [5] "raw_data_cache/v2/estudios_basicos/por-distritos/viajes/ficheros-diarios/year=2024/month=1/day=2/Viajes_distritos.csv.gz"
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 11 x64 (build 22631)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United Kingdom.utf8
#>  ctype    English_United Kingdom.utf8
#>  tz       Europe/London
#>  date     2024-09-06
#>  pandoc   3.2.1 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  bit             4.0.5   2022-11-15 [1] CRAN (R 4.4.1)
#>  bit64           4.0.5   2020-08-30 [1] CRAN (R 4.4.1)
#>  cachem          1.1.0   2024-05-16 [1] CRAN (R 4.4.1)
#>  callr           3.7.6   2024-03-25 [1] CRAN (R 4.4.1)
#>  class           7.3-22  2023-05-03 [2] CRAN (R 4.4.1)
#>  classInt        0.4-10  2023-09-05 [1] CRAN (R 4.4.1)
#>  cli             3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
#>  crayon          1.5.3   2024-06-20 [1] CRAN (R 4.4.1)
#>  curl            5.2.1   2024-03-01 [1] CRAN (R 4.4.1)
#>  DBI             1.2.3   2024-06-02 [1] CRAN (R 4.4.1)
#>  dbplyr          2.5.0   2024-03-19 [1] CRAN (R 4.4.1)
#>  desc            1.4.3   2023-12-10 [1] CRAN (R 4.4.1)
#>  devtools        2.4.5   2022-10-11 [1] CRAN (R 4.4.1)
#>  digest          0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
#>  dplyr           1.1.4   2023-11-17 [1] CRAN (R 4.4.1)
#>  duckdb          1.0.0-2 2024-07-19 [1] CRAN (R 4.4.1)
#>  e1071           1.7-14  2023-12-06 [1] CRAN (R 4.4.1)
#>  ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.4.1)
#>  evaluate        0.24.0  2024-06-10 [1] CRAN (R 4.4.1)
#>  fansi           1.0.6   2023-12-08 [1] CRAN (R 4.4.1)
#>  fastmap         1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  fs              1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
#>  generics        0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
#>  glue            1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
#>  hms             1.1.3   2023-03-21 [1] CRAN (R 4.4.1)
#>  htmltools       0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  htmlwidgets     1.6.4   2023-12-06 [1] CRAN (R 4.4.1)
#>  httpuv          1.6.15  2024-03-26 [1] CRAN (R 4.4.1)
#>  KernSmooth      2.23-24 2024-05-17 [2] CRAN (R 4.4.1)
#>  knitr           1.48    2024-07-07 [1] CRAN (R 4.4.1)
#>  later           1.3.2   2023-12-06 [1] CRAN (R 4.4.1)
#>  lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  lubridate       1.9.3   2023-09-27 [1] CRAN (R 4.4.1)
#>  magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
#>  memoise         2.0.1   2021-11-26 [1] CRAN (R 4.4.1)
#>  memuse          4.2-3   2023-01-24 [1] CRAN (R 4.4.0)
#>  mime            0.12    2021-09-28 [1] CRAN (R 4.4.0)
#>  miniUI          0.1.1.1 2018-05-18 [1] CRAN (R 4.4.1)
#>  parallelly      1.38.0  2024-07-27 [1] CRAN (R 4.4.1)
#>  pillar          1.9.0   2023-03-22 [1] CRAN (R 4.4.1)
#>  pkgbuild        1.4.4   2024-03-17 [1] CRAN (R 4.4.1)
#>  pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
#>  pkgload         1.4.0   2024-06-28 [1] CRAN (R 4.4.1)
#>  processx        3.8.4   2024-03-16 [1] CRAN (R 4.4.1)
#>  profvis         0.3.8   2023-05-02 [1] CRAN (R 4.4.1)
#>  promises        1.3.0   2024-04-05 [1] CRAN (R 4.4.1)
#>  proxy           0.4-27  2022-06-09 [1] CRAN (R 4.4.1)
#>  ps              1.7.6   2024-01-18 [1] CRAN (R 4.4.1)
#>  purrr           1.0.2   2023-08-10 [1] CRAN (R 4.4.1)
#>  R.cache         0.16.0  2022-07-21 [1] CRAN (R 4.4.1)
#>  R.methodsS3     1.8.2   2022-06-13 [1] CRAN (R 4.4.0)
#>  R.oo            1.26.0  2024-01-24 [1] CRAN (R 4.4.0)
#>  R.utils         2.12.3  2023-11-18 [1] CRAN (R 4.4.1)
#>  R6              2.5.1   2021-08-19 [1] CRAN (R 4.4.1)
#>  Rcpp            1.0.13  2024-07-17 [1] CRAN (R 4.4.1)
#>  readr           2.1.5   2024-01-10 [1] CRAN (R 4.4.1)
#>  remotes         2.5.0   2024-03-17 [1] CRAN (R 4.4.1)
#>  reprex          2.1.0   2024-01-11 [1] CRAN (R 4.4.1)
#>  rlang           1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
#>  rmarkdown       2.28    2024-08-17 [1] CRAN (R 4.4.1)
#>  sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
#>  sf              1.0-16  2024-03-24 [1] CRAN (R 4.4.1)
#>  shiny           1.8.1.1 2024-04-02 [1] CRAN (R 4.4.1)
#>  spanishoddata * 0.0.1   2024-09-06 [1] Github (rOpenSpain/spanishoddata@31cc7fe)
#>  stringi         1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
#>  stringr         1.5.1   2023-11-14 [1] CRAN (R 4.4.1)
#>  styler          1.10.3  2024-04-07 [1] CRAN (R 4.4.1)
#>  tibble          3.2.1   2023-03-20 [1] CRAN (R 4.4.1)
#>  tidyselect      1.2.1   2024-03-11 [1] CRAN (R 4.4.1)
#>  timechange      0.3.0   2024-01-18 [1] CRAN (R 4.4.1)
#>  tzdb            0.4.0   2023-05-12 [1] CRAN (R 4.4.1)
#>  units           0.8-5   2023-11-28 [1] CRAN (R 4.4.1)
#>  urlchecker      1.0.1   2021-11-30 [1] CRAN (R 4.4.1)
#>  usethis         2.2.3   2024-02-19 [1] CRAN (R 4.4.1)
#>  utf8            1.2.4   2023-10-22 [1] CRAN (R 4.4.1)
#>  vctrs           0.6.5   2023-12-01 [1] CRAN (R 4.4.1)
#>  vroom           1.6.5   2023-12-05 [1] CRAN (R 4.4.1)
#>  withr           3.0.1   2024-07-31 [1] CRAN (R 4.4.1)
#>  xfun            0.45    2024-06-16 [1] CRAN (R 4.4.1)
#>  xml2            1.3.6   2023-12-04 [1] CRAN (R 4.4.1)
#>  xtable          1.8-4   2019-04-21 [1] CRAN (R 4.4.1)
#>  yaml            2.3.8   2023-12-11 [1] CRAN (R 4.4.0)
#> 
#>  [1] C:/Users/georl_admin/AppData/Local/R/win-library/4.4
#>  [2] C:/Program Files/R/R-4.4.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-09-06 with reprex v2.1.0

e-kotov commented 1 week ago

Ok, at least I now have a machine where i can reproduce this. Will get back to you

e-kotov commented 1 week ago

Ok, now it's definitely fixed. Tested two times on a windows machine with clean setup. I will merge this fix now, as it is quite critical actually, I want anyone checking out the package to not run into this problem. But please do re-test on your machine.