Open elliotbeck opened 1 year ago
Dear Elliot,
Thank you very much for using my package and reporting this issue!
The "BFS" package is using under the hood the R package {pxweb} to query the BFS API. After a quick look I haven't found any option to increase the batch size or add a delay. I will investigate more. Feel free to share in this issue any discovery or suggestion from your side.
Another solution is to reduce the size of the dataset using the query
argument of the function bfs_get_data()
. I discovered another bug in my code when using the query
argument. I just pushed the fix on GitHub. So the following R code works only with the dev version of the "BFS" package on Github. I will soon push this fix on CRAN.
Please let me know if this works for you.
# Install dev version
devtools::install_github("lgnbhl/BFS")
library(BFS)
# choose a BFS number and language
number_bfs <- "px-x-1003020000_103"
language <- "en"
# create the BFS api url
pxweb_api_url <- paste0("https://www.pxweb.bfs.admin.ch/api/v1/",
language, "/", number_bfs, "/", number_bfs, ".px")
# Get BFS table metadata using {pxweb}
px_meta <- pxweb::pxweb_get(pxweb_api_url)
# list variables items
str(px_meta$variables)
# Manually create BFS query dimensions
# Use `code` and `values` elements in `px_meta$variables`
# Use "*" to select all
dims <- list("Jahr" = c("2020", "2021"),
"Monat" = c("YYYY"),
"Indikator" = c("*"))
# Query BFS data with specific dimensions
BFS::bfs_get_data(
number_bfs = number_bfs,
language = language,
query = dims
)
# A tibble: 4 Γ 4
Year Month Indicator Hotelβ¦ΒΉ
<chr> <chr> <chr> <dbl>
1 2020 Total of the year Arrivals 1.07e7
2 2020 Total of the year Overnight s⦠2.37e7
3 2021 Total of the year Arrivals 1.37e7
4 2021 Total of the year Overnight s⦠2.96e7
# β¦ with abbreviated variable name
# ΒΉβ`Hotel sector: arrivals and overnight stays of open establishments`
Best, Felix
Short question @lgnbhl , I get this message, too, but I guess the bfs has added purposely burdens on the API for security reasons. Is there other solution to surpass the API limits in clever ways apart from using some VPN switchers and other networking magic?
@philipp-baumann no, I am not aware of other solutions to surpass the API limits.
Hi @elliotbeck and @philipp-baumann
I ran again the R code shared for this issue and it works just fine for me now.
Is the following R code still throwing an error to you?
BFS::bfs_get_data(number_bfs = "px-x-1003020000_103", language = "de")
Maybe they have changed something in the BFS API or in the {pxweb} R package since this issue has been submitted...
By the way, the new version of the BFS package (for now only available on GitHub) provides a new function to download locally any file by BFS number (or asset number). For the case of a large PX file, this speeds up the R code a lot.
devtools::install_github("lgnbhl/BFS")
BFS::bfs_download_asset(
number_bfs = "px-x-1003020000_103", #number_asset also possible
destfile = "px-x-1003020000_103.px"
)
library(pxR) # install.packages("pxR")
large_dataset <- pxR::read.px(filename = "px-x-1003020000_103.px") |>
as.data.frame()
## # A tibble: 539,448 Γ 6
## Indikator Herkunftsland Tourismusregion Monat Jahr value
## <fct> <fct> <fct> <fct> <fct> <dbl>
## 1 AnkΓΌnfte Herkunftsland - Total Schweiz Jahrestotal 2005 13802796
## 2 LogiernΓ€chte Herkunftsland - Total Schweiz Jahrestotal 2005 32943736
## 3 AnkΓΌnfte Schweiz Schweiz Jahrestotal 2005 6573945
## 4 LogiernΓ€chte Schweiz Schweiz Jahrestotal 2005 14622420
## 5 AnkΓΌnfte Baltische Staaten Schweiz Jahrestotal 2005 13115
## 6 LogiernΓ€chte Baltische Staaten Schweiz Jahrestotal 2005 32871
## 7 AnkΓΌnfte Deutschland Schweiz Jahrestotal 2005 2007203
## 8 LogiernΓ€chte Deutschland Schweiz Jahrestotal 2005 5563695
## 9 AnkΓΌnfte Frankreich Schweiz Jahrestotal 2005 542502
## 10 LogiernΓ€chte Frankreich Schweiz Jahrestotal 2005 1225619
## # βΉ 539,438 more rows
Please note that reading a PX file using pxR::read.px()
gives access only to the German version.
Thanks! I'll give it a test tomorrow and let you know. Cheers
I still get the Too Many Requests (RFC 6585) (HTTP 429) error. Best Elliot
With a Swiss IP I get "px-x-1003020000_103.px" without error. Both with the batched approach and new BFS::bfs_download_asset()
. Is there maybe an API limit per time? I am using the latest CRAN version of {pxweb}.
r$> sessioninfo::session_info()
β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
setting value
version R version 4.1.3 (2022-03-10)
os Ubuntu 22.04.2 LTS
system x86_64, linux-gnu
ui X11
language
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Zurich
date 2023-07-27
pandoc 2.9.2.1 @ /usr/bin/pandoc
β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
! package * version date (UTC) lib source
anytime 0.3.9 2020-08-27 [1] CRAN (R 4.1.3)
backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
BFS 0.5.1.999 2023-07-27 [1] Github (lgnbhl/BFS@a583276)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.1.3)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.2)
blob 1.2.3 2022-04-10 [1] CRAN (R 4.1.3)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.1.3)
callr 3.7.3 2022-11-02 [1] CRAN (R 4.1.3)
checkmate 2.2.0 2023-04-27 [1] CRAN (R 4.1.3)
cli 3.6.1 2023-03-23 [1] CRAN (R 4.1.3)
crancache 0.0.0.9001 2022-01-20 [1] Github (r-lib/crancache@7ea4e47)
cranlike 1.0.2 2018-11-26 [1] CRAN (R 4.1.2)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.1.3)
V curl 5.0.0 2023-06-07 [1] CRAN (R 4.1.3) (on disk 5.0.1)
DBI 1.1.3 2022-06-18 [1] RSPM (R 4.1.0)
debugme 1.1.0 2017-10-22 [1] CRAN (R 4.1.2)
desc 1.4.2 2022-09-08 [1] CRAN (R 4.1.3)
digest 0.6.33 2023-07-07 [1] CRAN (R 4.1.3)
dplyr 1.1.2 2023-04-20 [1] CRAN (R 4.1.3)
fansi 1.0.4 2023-01-22 [1] CRAN (R 4.1.3)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.1.3)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.1.3)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
httr 1.4.6 2023-05-08 [1] CRAN (R 4.1.3)
httr2 0.2.3 2023-05-08 [1] CRAN (R 4.1.3)
janitor 2.2.0 2023-02-02 [1] CRAN (R 4.1.3)
jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.1.3)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.1.3)
lubridate 1.9.2 2023-02-10 [1] CRAN (R 4.1.3)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.3)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.2)
parsedate 1.2.1 2021-04-20 [1] CRAN (R 4.1.2)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.1.3)
pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.1.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.2)
plyr * 1.8.7 2022-03-24 [1] CRAN (R 4.1.3)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.2)
processx 3.8.2 2023-06-30 [1] CRAN (R 4.1.3)
ps 1.7.5 2023-04-18 [1] CRAN (R 4.1.3)
purrr 1.0.1 2023-01-10 [1] CRAN (R 4.1.3)
pxR * 0.42.7 2022-11-23 [1] CRAN (R 4.1.3)
pxweb 0.16.2 2022-10-31 [1] CRAN (R 4.1.3)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.2)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.1.2)
Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.1.3)
rematch2 2.1.2 2020-05-01 [1] CRAN (R 4.1.2)
remotes * 2.4.2 2021-11-30 [1] CRAN (R 4.1.3)
reshape2 * 1.4.4 2020-04-09 [1] CRAN (R 4.1.3)
RJSONIO * 1.3-1.6 2021-09-16 [1] CRAN (R 4.1.2)
rlang 1.1.1 2023-04-28 [1] CRAN (R 4.1.3)
rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.1.3)
RSQLite 2.2.14 2022-05-07 [1] CRAN (R 4.1.3)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.1.3)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.1.3)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.1.3)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.1.3)
tidyRSS 2.0.7 2023-03-05 [1] CRAN (R 4.1.3)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.1.3)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.1.3)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.1.3)
vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.1.3)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.3)
xml2 1.3.5 2023-07-06 [1] CRAN (R 4.1.3)
[1] /home/philipp/R/x86_64-pc-linux-gnu-library/4.1
[2] /opt/R/4.1.3/lib/R/library
V ββ Loaded and on-disk version mismatch.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
@philipp-baumann yes, there is a time window limit of 10: https://www.pxweb.bfs.admin.ch/api/v1/de/?config.
@philipp-baumann yes, there is a time window limit of 10: https://www.pxweb.bfs.admin.ch/api/v1/de/?config.
thanks @lgnbhl for pointing to that config.
I ran BFS::bfs_get_data(number_bfs = "px-x-1003020000_103")
earlier today and I got the error message again. But now, the function works again. Not sure how to explain the change: could be the BFS API server...
The error is not caused by a new version of the {pxweb} R package (currently 0.16.2) as they have not pushed a new version since 2022-10-31.
I have updated the documentation to reflect our discussion: https://github.com/lgnbhl/BFS#too-many-requests-error-message
Best, Felix
Please find below an R script showing a programmatic solution to query a large BFS dataset.
This R code creates a list of smaller queries and join them using purrr::pmap_dfr()
.
To avoid getting an error message due to the BFS API limits, I added the new argument "delay" in bfs_get_data()
which calls Sys.sleep()
. The code below adds a 10 seconds delay before the query.
Be sure to have a least v.0.5.6 of the BFS package installed.
#devtools::install_github("lgnbhl/BFS") # for BFS v.0.5.6
library(BFS)
library(purrr)
# should at least use version 0.5.6
packageVersion("BFS") >= "0.5.6"
# choose a BFS number and language
number_bfs <- "px-x-1003020000_103"
language <- "en"
# get metadata
meta <- bfs_get_metadata(number_bfs = number_bfs, language = language)
# create dimension object
dims <- meta$values
names(dims) <- meta$code
# split 1st dimension "Jahr" in chunks of 1 element
# NOTE: depending of the data, other dimension should be used, e.g. dims[[2]]
dims1 <- dims[[1]]
dim_splited <- split(dims1, cut(seq_along(dims1), length(dims1), labels = FALSE))
names(dim_splited) <- rep(names(dims)[1], length(dim_splited))
# create query list
query_list <- vector(mode = "list", length = length(dim_splited))
for (i in seq_along(dim_splited)) {
query_list[[i]] <- c(dim_splited[i], dims[-1])
}
names(query_list) <- rep("query", length(query_list))
# list of arguments for loop
args_list <- list(
number_bfs = rep(number_bfs, length(query_list)),
language = rep(language, length(query_list)),
delay = rep(10, length(query_list)), # 10 seconds delay before query
query = query_list
)
# loop with smaller queries using bfs_get_data()
df <- purrr::pmap_dfr(.l = args_list, .f = bfs_get_data, .progress = TRUE)
df
## # A tibble: 539,448 Γ 6
## Year Month `Tourist region` Visitors' country of resiβ¦ΒΉ Indicator
## <chr> <chr> <chr> <chr> <chr>
## 1 2005 Total of the year Switzerland Visitors' country of resid⦠Arrivals
## 2 2005 Total of the year Switzerland Visitors' country of residβ¦ Overnighβ¦
## 3 2005 Total of the year Switzerland Switzerland Arrivals
## 4 2005 Total of the year Switzerland Switzerland Overnighβ¦
## 5 2005 Total of the year Switzerland Baltic States Arrivals
## 6 2005 Total of the year Switzerland Baltic States Overnighβ¦
## 7 2005 Total of the year Switzerland Germany Arrivals
## 8 2005 Total of the year Switzerland Germany Overnighβ¦
## 9 2005 Total of the year Switzerland France Arrivals
## 10 2005 Total of the year Switzerland France Overnighβ¦
## # βΉ 539,438 more rows
## # βΉ abbreviated name: ΒΉβ`Visitors' country of residence`
## # βΉ 1 more variable:
## # `Hotel sector: arrivals and overnight stays of open establishments` <dbl>
@philipp-baumann @elliotbeck feel free to let me know if this solution works for you :)
Dear FΓ©lix
Thanks a lot for the nice package you provide! I came across the following issue when downloading a rather long query:
Maybe this could be resolved by increasing the batch size or adding a delay?
best Elliot