Closed bttomio closed 3 years ago
Hi Bruno,
Thank you for raising this issue. And sorry for the late answer.
The error is produced by the fact that the function read.px()
from the R package {pxR}, a dependency of {BFS}, fails to read the specific PX file you selected in your code. As the function bfs_get_dataset()
works well with other files, the problem may come from the internal structure of the PX file you selected.
But surprisingly your code works fine on my Mac. This is strange. It seems to be a bug specific to Windows. I will investigate more and hopefully come back with a solution.
Best, FĂ©lix
Thank you for your reply, FĂ©lix!
My guess is that it's related to the slash. This "C:\Users\Bruno\AppData\Local\Temp\Rtmp6zrwjd/bfs_data_13967917_en.px" should be "C:/Users/Bruno/AppData/Local/Temp/Rtmp6zrwjd/bfs_data_13967917_en.px" to work correctly in Windows.
Okay so I finally took some time to dig further into this bug.
This issue comes from the fileEncoding
argument of the scan()
function used inside the pxR::read.px()
function.
I fixed the issue by forcing the encoding to be "latin1" and pushed the new package version on Github. Please let me know if this code now works for you:
devtools::install_github("lgnbhl/BFS")
library(magrittr)
meta_en_ind <- bfs_get_metadata("en") %>%
bfs_search("production")
print(meta_en_ind)
df_ind <- bfs_get_dataset(url_px = meta_en_ind$url_px[1], language = "en")
Once again a big thanks for sharing with me this bug!
Hi FĂ©lix! Thanks a lot for your reply.
I'm still getting an error with the last line of the code. Could you please check it out?
devtools::install_github("lgnbhl/BFS")
#> Skipping install of 'BFS' from a github remote, the SHA1 (42f9be53) has not changed since last install.
#> Use `force = TRUE` to force installation
library(BFS)
library(magrittr)
meta_en_ind <- bfs_get_metadata("en") %>%
bfs_search("production")
print(meta_en_ind)
#> # A tibble: 3 x 6
#> title observation_peri~ published source url_bfs url_px
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Secondary Se~ 1.10.2011-30.6.2~ 20.08.2020 Federal~ https://www.b~ https://ww~
#> 2 Secondary Se~ 1.1.1999-30.6.20~ 20.08.2020 Federal~ https://www.b~ https://ww~
#> 3 Secondary Se~ 1999-2019 25.05.2020 Federal~ https://www.b~ https://ww~
df_ind <- bfs_get_dataset(url_px = meta_en_ind$url_px[1], language = "en")
#> Warning in scan(filename, what = "character", sep = "\n",
#> quiet = TRUE, : invalid input found on input connection 'C:
#> \Users\Bruno\AppData\Local\Temp\RtmpOsARHl/bfs_data_13967917_en.px'
#> Error in pxR::read.px(file.path(tempfile_path), encoding = "latin1", na.strings = c("\".\"", : The input file is malformed: data and varnames length differ
Created on 2020-10-01 by the reprex package (v0.3.0)
Hi Bruno,
The bug has been fixed in the last CRAN version of BFS (0.3.0 now). The R code above is now working on my Windows.
The fix has been kindly share by Fachstelle Statistik Kanton Zug, i.e. @statzg.
Please let me know if it works for you so I can close this issue :).
Best, FĂ©lix
Hi FĂ©lix,
Thanks a lot for your reply. Glad that you could find a solution with the help of @statzg. It's working now, on Windows. Nevertheless, data is in German. I've also tried to run the code on Linux (Ubuntu), which is not working at all. Here is a feedback:
After typing df_ind <- bfs_get_dataset(url_px = meta_en_ind$url_px[1], language = "en")
, I'm getting this error message: Failed to translate name.
If I repeat the command, it works. Nonetheless, it's not considering the language option. As you can see, it's in German:
> df_ind
# A tibble: 32,472 x 6
month branch variable indices_changes adjustment value
<fct> <fct> <fct> <fct> <fct> <dbl>
1 2010M10 B-E Industrie Produktion Indizes Unbereinigt 101.
2 2010M11 B-E Industrie Produktion Indizes Unbereinigt 108.
3 2010M12 B-E Industrie Produktion Indizes Unbereinigt 105.
4 2011M01 B-E Industrie Produktion Indizes Unbereinigt 89.5
5 2011M02 B-E Industrie Produktion Indizes Unbereinigt 95.8
6 2011M03 B-E Industrie Produktion Indizes Unbereinigt 101.
7 2011M04 B-E Industrie Produktion Indizes Unbereinigt 90.3
8 2011M05 B-E Industrie Produktion Indizes Unbereinigt 104.
9 2011M06 B-E Industrie Produktion Indizes Unbereinigt 94.3
10 2011M07 B-E Industrie Produktion Indizes Unbereinigt 93.2
# ... with 32,462 more rows
Therefore, it's working, but not as expected.
Here is the information for this session on Windows:
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magrittr_2.0.1 BFS_0.3.0
loaded via a namespace (and not attached):
[1] ggrepel_0.9.1 Rcpp_1.0.6 lubridate_1.7.10 lattice_0.20-41 prettyunits_1.1.1 ps_1.6.0 zoo_1.8-9
[8] assertthat_0.2.1 rprojroot_2.0.2 utf8_1.2.1 R6_2.5.0 plyr_1.8.6 pxR_0.42.4 backports_1.2.1
[15] httr_1.4.2 ggplot2_3.3.3 pillar_1.5.1 rlang_0.4.10 progress_1.2.2 curl_4.3 rstudioapi_0.13
[22] callr_3.5.1 pins_0.4.5 desc_1.3.0 devtools_2.3.2 selectr_0.4-2 stringr_1.4.0 munsell_0.5.0
[29] anytime_0.3.9 compiler_4.0.3 janitor_2.1.0 pkgconfig_2.0.3 pkgbuild_1.2.0 tidyselect_1.1.0 tibble_3.1.0
[36] fansi_0.4.2 crayon_1.4.1 dplyr_1.0.5 withr_2.4.1 rappdirs_0.3.3 grid_4.0.3 jsonlite_1.7.2
[43] gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1 scales_1.1.1 cli_2.3.1 stringi_1.5.3 cachem_1.0.4
[50] reshape2_1.4.4 fs_1.5.0 remotes_2.2.0 testthat_3.0.2 snakecase_0.11.0 xml2_1.3.2 filelock_1.0.2
[57] ellipsis_0.3.1 xts_0.12.1 generics_0.1.0 vctrs_0.3.6 cowplot_1.1.1 tidyRSS_2.0.3 tools_4.0.3
[64] RJSONIO_1.3-1.4 glue_1.4.2 purrr_0.3.4 hms_1.0.0 yaml_2.2.1 processx_3.5.0 pkgload_1.2.0
[71] fastmap_1.1.0 colorspace_2.0-0 sessioninfo_1.1.1 rvest_1.0.0 memoise_2.0.0 usethis_2.0.1
On Linux, this is the error message after df_ind <- bfs_get_dataset(url_px = meta_en_ind$url_px[1], language = "en")
:
trying URL 'https://www.bfs.admin.ch/bfsstatic/dam/assets/16044446/master'
downloaded 574 KB
Error in gsub("\"......\"", "\"....\"", x, fixed = TRUE) :
input string 16 is invalid in this locale
Here is the session info for this case:
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BFS_0.3.0 magrittr_2.0.1
loaded via a namespace (and not attached):
[1] tidyRSS_2.0.3 httr_1.4.2 pkgload_1.2.0 jsonlite_1.7.2 viridisLite_0.3.0 assertthat_0.2.1 selectr_0.4-2
[8] yaml_2.2.1 remotes_2.2.0 progress_1.2.2 ggrepel_0.9.1 sessioninfo_1.1.1 pillar_1.5.1 backports_1.2.1
[15] lattice_0.20-41 glue_1.4.2 digest_0.6.27 rvest_1.0.0 snakecase_0.11.0 colorspace_2.0-0 plyr_1.8.6
[22] cowplot_1.1.1 htmltools_0.5.1.1 pkgconfig_2.0.3 devtools_2.3.2 purrr_0.3.4 scales_1.1.1 webshot_0.5.2
[29] processx_3.5.0 svglite_2.0.0 tibble_3.1.0 generics_0.1.0 ggplot2_3.3.3 usethis_2.0.1 ellipsis_0.3.1
[36] cachem_1.0.4 withr_2.4.1 janitor_2.1.0 cli_2.3.1 RJSONIO_1.3-1.4 crayon_1.4.1 memoise_2.0.0
[43] evaluate_0.14 ps_1.6.0 fs_1.5.0 fansi_0.4.2 anytime_0.3.9 xts_0.12.1 xml2_1.3.2
[50] pkgbuild_1.2.0 pins_0.4.5 tools_4.0.4 prettyunits_1.1.1 hms_1.0.0 lifecycle_1.0.0 stringr_1.4.0
[57] munsell_0.5.0 pxR_0.42.4 callr_3.5.1 kableExtra_1.3.4 compiler_4.0.4 systemfonts_1.0.1 rlang_0.4.10
[64] grid_4.0.4 rstudioapi_0.13 rappdirs_0.3.3 rmarkdown_2.7 testthat_3.0.2 gtable_0.3.0 DBI_1.1.1
[71] curl_4.3 reshape2_1.4.4 R6_2.5.0 gridExtra_2.3 zoo_1.8-9 lubridate_1.7.10 knitr_1.31
[78] dplyr_1.0.5 fastmap_1.1.0 utf8_1.2.1 filelock_1.0.2 rprojroot_2.0.2 desc_1.3.0 stringi_1.5.3
[85] Rcpp_1.0.6 vctrs_0.3.6 tidyselect_1.1.0 xfun_0.22
Thanks a lot for your package and sorry from bringing up another issue. Please let me know if I can help you somehow.
Best,
Bruno
Hi @bttomio,
I just merged the fix proposed by @zambujo. Could you please let me know if the package is now working on Ubuntu?
Please note that the function bfs_get_dataset()
is now only providing data in the German language.
Best, FĂ©lix
Hi FĂ©lix!
Thanks a lot for this update. It's cool to see your package progressing. I'm looking forward to the next updates, notably the ability to extract data in English or French.
With the current version, the error message is gone on Ubuntu.
Best, Bruno
PS: here is my session information just for the record:
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magrittr_2.0.1 BFS_0.3.0
loaded via a namespace (and not attached):
[1] tidyRSS_2.0.3 httr_1.4.2 pkgload_1.2.1 jsonlite_1.7.2 viridisLite_0.4.0 assertthat_0.2.1 selectr_0.4-2 yaml_2.2.1
[9] remotes_2.3.0 progress_1.2.2 ggrepel_0.9.1 sessioninfo_1.1.1 pillar_1.6.0 backports_1.2.1 lattice_0.20-44 glue_1.4.2
[17] digest_0.6.27 rvest_1.0.0 snakecase_0.11.0 colorspace_2.0-1 cowplot_1.1.1 htmltools_0.5.1.1 plyr_1.8.6 pkgconfig_2.0.3
[25] devtools_2.4.1 purrr_0.3.4 scales_1.1.1 webshot_0.5.2 processx_3.5.2 svglite_2.0.0 tibble_3.1.1 generics_0.1.0
[33] ggplot2_3.3.3 usethis_2.0.1 ellipsis_0.3.2 cachem_1.0.4 withr_2.4.2 janitor_2.1.0 cli_2.5.0 RJSONIO_1.3-1.4
[41] crayon_1.4.1 memoise_2.0.0 evaluate_0.14 ps_1.6.0 fs_1.5.0 fansi_0.4.2 anytime_0.3.9 xts_0.12.1
[49] xml2_1.3.2 pkgbuild_1.2.0 pins_0.4.5 tools_4.0.5 prettyunits_1.1.1 hms_1.0.0 lifecycle_1.0.0 stringr_1.4.0
[57] munsell_0.5.0 pxR_0.42.4 callr_3.7.0 kableExtra_1.3.4 compiler_4.0.5 systemfonts_1.0.1 rlang_0.4.11 grid_4.0.5
[65] rstudioapi_0.13 rappdirs_0.3.3 rmarkdown_2.8 testthat_3.0.2 gtable_0.3.0 DBI_1.1.1 curl_4.3.1 reshape2_1.4.4
[73] R6_2.5.0 zoo_1.8-9 lubridate_1.7.10 knitr_1.33 dplyr_1.0.6 fastmap_1.1.0 utf8_1.2.1 filelock_1.0.2
[81] rprojroot_2.0.2 desc_1.3.0 stringi_1.5.3 Rcpp_1.0.6 vctrs_0.3.8 tidyselect_1.1.1 xfun_0.22
Hi,
Thanks a lot for your package. It's really useful!
I'm having trouble in downloading data. Here is my code:
This is the error message:
Could you please help me out with this issue?
Many thanks,
Bruno
Session info:
R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] magrittr_1.5 BFS_0.2.5 OECD_0.2.4 IMFData_0.2.0 anytime_0.3.8 openxlsx_4.1.5 forcats_0.5.0
[8] stringr_1.4.0 purrr_0.3.4 readr_1.3.1 tidyr_1.1.1 ggplot2_3.3.2 tidyverse_1.3.0 tibble_3.0.3
[15] quantmod_0.4.17 TTR_0.24.0 Quandl_2.10.0 xts_0.12-0 zoo_1.8-8 lubridate_1.7.9 dplyr_1.0.2
loaded via a namespace (and not attached): [1] Rcpp_1.0.5 lattice_0.20-41 prettyunits_1.1.1 assertthat_0.2.1 utf8_1.1.4 pxR_0.42.4
[7] R6_2.4.1 cellranger_1.1.0 plyr_1.8.6 backports_1.1.9 reprex_0.3.0 rsdmx_0.5-14
[13] httr_1.4.2 pillar_1.4.6 progress_1.2.2 rlang_0.4.7 curl_4.3 readxl_1.3.1
[19] rstudioapi_0.11 blob_1.2.1 pins_0.4.3 selectr_0.4-2 RCurl_1.98-1.2 munsell_0.5.0
[25] broom_0.7.0 compiler_4.0.2 modelr_0.1.8 janitor_2.0.1 pkgconfig_2.0.3 tidyselect_1.1.0 [31] XML_3.99-0.5 fansi_0.4.1 crayon_1.3.4 dbplyr_1.4.4 withr_2.2.0 rappdirs_0.3.1
[37] bitops_1.0-6 grid_4.0.2 jsonlite_1.7.0 gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0
[43] scales_1.1.1 zip_2.1.0 cli_2.0.2 stringi_1.4.6 reshape2_1.4.4 fs_1.5.0
[49] snakecase_0.11.0 xml2_1.3.2 filelock_1.0.2 ellipsis_0.3.1 generics_0.0.2 vctrs_0.3.2
[55] RJSONIO_1.3-1.4 tools_4.0.2 glue_1.4.1 hms_0.5.3 yaml_2.2.1 colorspace_1.4-1 [61] rvest_0.3.6 haven_2.3.1