Closed lbertela closed 1 month ago
Hello,
Thanks for your message. I just pushed a fix on this GitHub repository. You can install the fix wit the following:
remotes::install_github("lgnbhl/BFS")
Now the functions should work again :).
These functions break because the official BFS website change it RSS feed structure (which I was scraping). Now with the new fix I am getting the catalogs from the BFS API. This should be more stable and allows to access more catalog metadata. In particular the functions returns directly the BFS number which will simplify the general workflow (see the updated README). I am thinking about adding more catalog metadata.
The new version will be pushed to CRAN soon.
Best, FĂ©lix
Hello, Thank you so much for the quick fix! It's great to hear that the API integration will offer more stability. I’m really excited about the expanded access to metadata, the usability of this package keeps getting better! Best, Ludovic
Hello again,
When loading both functions bfs_get_catalog_data() and bfs_get_catalog_tables() with language "de" for example, they both return a data.frame limited to 350 lines. Additionnaly, in both data.frames the "number_asset" is now unique for each line.
Would it be possible to extend the search to all available tables or data, and not a limited number of 350 ? Having access to the correct number_asset in bfs_get_catalog_tables() would help greatly if we are interested in downloading the data with bfs_download_asset().
Thanks again!
Ludovic
Hello Ludovic,
Thanks a lot for catching this bug regarding the "number_asset" variable! I just pushed a quick fix, now in BFS version 0.5.10. As usual you can access it with:
remotes::install_github("lgnbhl/BFS")
Regarding the limit of 350 lines, I guess it is an API limit. I will see if it is possible to bypass this limit of 350 lines in an new patch of BFS.
I will push this hot fix on CRAN soon.
Please let me know if you are interested in any other new features for the BFS R package (feel free to create a new GitHub issue for them if they are not related to this bug).
Best regards, FĂ©lix
An idea to get the full data catalog could be to loop over a given argument, for example prodima
(possibly dates could work too) using purrr::pmap_dfr()
:
# themes_names <- c("Statistical basis and overviews 00", "Population 01", "Territory and environment 02", "Work and income 03", "National economy 04", "Prices 05", "Industry and services 06", "Agriculture and forestry 07", "Energy 08", "Construction and housing 09", "Tourism 10", "Mobility and transport 11", "Money, banks and insurance 12", "Social security 13", "Health 14", "Education and science 15", "Culture, media, information society, sports 16", "Politics 17", "General Government and finance 18", "Crime and criminal justice 19", "Economic and social situation of the population 20", "Sustainable development, regional and international disparities 21")
themes_prodima <- c(900001, 900010, 900035, 900051, 900075, 900084, 900092, 900104, 900127, 900140, 900160, 900169, 900191, 900198, 900210, 900212, 900214, 900226, 900239, 900257, 900269, 900276)
library(BFS)
library(purrr)
catalog_all <- purrr::pmap_dfr(
.l = list(language = "de", prodima = themes_prodima),
.f = bfs_get_catalog_data,
)
# A tibble: 760 Ă— 9
title language publication_date number_asset order_nr url_px language_available
<chr> <chr> <date> <chr> <chr> <chr> <list>
1 Heiraten und Heiratshä… de 2024-09-26 32506838 px-x-01… https… <chr [4]>
2 Lebendgeburten nach Mo… de 2024-09-26 32506840 px-x-01… https… <chr [4]>
3 Scheidungen und Scheid… de 2024-09-26 32506841 px-x-01… https… <chr [4]>
4 Todesfälle nach Monat … de 2024-09-26 32506839 px-x-01… https… <chr [4]>
5 Männliche Vornamen der… de 2024-08-23 32187356 px-x-01… https… <chr [4]>
6 Weibliche Vornamen der… de 2024-08-23 32187357 px-x-01… https… <chr [4]>
7 Auswanderung der ständ… de 2024-08-22 32208056 px-x-01… https… <chr [4]>
8 Auswanderung der ständ… de 2024-08-22 32208055 px-x-01… https… <chr [4]>
9 Auswanderung der ständ… de 2024-08-22 32208061 px-x-01… https… <chr [4]>
10 Auswanderung der ständ… de 2024-08-22 32208057 px-x-01… https… <chr [4]>
# â„ą 750 more rows
# â„ą 2 more variables: url_structure_json <chr>, damId <int>
Hello Ludovic,
I have added a new argument named return_raw
to allow the access of all the metadata in an raw / unstructured way when calling bfs_get_catalog_data()
and bfs_get_catalog_tables()
. I have updated the README to explain how to use it with an example.
This new feature is now in BFS version 0.5.11 on GitHub (soon on CRAN). As usual you can access it with: remotes::install_github("lgnbhl/BFS")
You can also access all the metadata of the full data catalog like this:
themes_prodima <- c(900001, 900010, 900035, 900051, 900075, 900084, 900092, 900104, 900127, 900140, 900160, 900169, 900191, 900198, 900210, 900212, 900214, 900226, 900239, 900257, 900269, 900276)
library(BFS)
library(purrr)
purrr::pmap_dfr(
.l = list(language = "de", prodima = themes_prodima, return_raw = TRUE), # added "return_raw" here
.f = bfs_get_catalog_data,
)
# A tibble: 760 Ă— 5
ids$uuid $contentId $gnp $damId bfs$embargo description$titles$m…¹ shop$orderNr links
<chr> <int> <chr> <int> <chr> <chr> <chr> <lis>
1 ef70eb19-9384-… 325772 2024… 3.25e7 2024-09-26… Heiraten und Heiratsh… px-x-010202… <df>
2 32069ba3-1cb4-… 189095 2024… 3.25e7 2024-09-26… Lebendgeburten nach M… px-x-010202… <df>
3 5a8b2ea1-e23b-… 325776 2024… 3.25e7 2024-09-26… Scheidungen und Schei… px-x-010202… <df>
4 66f3d4f6-edfc-… 189065 2024… 3.25e7 2024-09-26… Todesfälle nach Monat… px-x-010202… <df>
5 51dfa1cf-2199-… 13807205 2024… 3.22e7 2024-08-23… Männliche Vornamen de… px-x-010405… <df>
6 b65c9036-b000-… 13807212 2024… 3.22e7 2024-08-23… Weibliche Vornamen de… px-x-010405… <df>
7 38a86458-22d5-… 189124 2024… 3.22e7 2024-08-22… Auswanderung der stän… px-x-010302… <df>
8 6426823f-cb31-… 189120 2024… 3.22e7 2024-08-22… Auswanderung der stän… px-x-010302… <df>
9 7f9d861c-81aa-… 189087 2024… 3.22e7 2024-08-22… Auswanderung der stän… px-x-010302… <df>
10 20fec7fa-cbe5-… 325764 2024… 3.22e7 2024-08-22… Auswanderung der stän… px-x-010302… <df>
# â„ą 750 more rows
# ℹ abbreviated name: ¹​description$titles$main
# â„ą 14 more variables: ids$languageCopyId <int>, bfs$lifecycle <df[,4]>, $lifecycleGroup <chr>,
# $provisional <lgl>, $articleModel <df[,4]>, $articleModelGroup <df[,4]>,
# $lastUpdatedVersion <chr>, description$titles$sub <chr>,
# description$categorization <df[,13]>, $bibliography <df[,2]>, $shortSummary <df[,2]>,
# $language <chr>, $abstractShort <chr>, shop$stock <lgl>
If this feature fix this GitHub issue, feel free to close it.
Hello,
The functions bfs_get_catalog() , bfs_get_catalog_data(), bfs_get_catalog_tables() do not work anymore. They are returning an empty data.frame. Issues were found in CRAN checks : https://cran.r-project.org/web/checks/check_results_BFS.html
Thank you in advance :)