Search and download data from the Swiss Federal Statistical Office
The BFS
package allows to search and download public data from the
<a href="https://www.bfs.admin.ch/bfs/en/home/statistics/catalogue.html"
target="_blank">Swiss Federal Statistical Office (BFS stands for
Bundesamt für Statistik in German) APIs in a dynamic and reproducible
way.
install.packages("BFS")
You can also install the development version from Github.
devtools::install_github("lgnbhl/BFS")
library(BFS)
Before downloading a BFS dataset, you need to get its related BFS number
(FSO number) in the official data
catalog.
You can search in the catalog directly from R using the
bfs_get_catalog_data()
function in any language (“de”, “fr”, “it” or
“en”):
bfs_get_catalog_data(language = "en", extended_search = "student")
## # A tibble: 4 × 6
## title language number_bfs number_asset publication_date url_px
## <chr> <chr> <chr> <chr> <date> <chr>
## 1 University of applie… en px-x-1502… 31306033 2024-03-28 https…
## 2 University of applie… en px-x-1502… 31306029 2024-03-28 https…
## 3 University students … en px-x-1502… 31305852 2024-03-28 https…
## 4 University students … en px-x-1502… 31305854 2024-03-28 https…
You can search in the data catalog using the following arguments:
language
: The language of a BFS catalog, i.e. “de”, “fr”, “it” or
“en”.title
: to search in title, subtitle and supertitle.extended_search
: extended search in (sub/super)title, orderNr,
summary, shortSummary, shortTextGNP.spatial_division
: choose between “Switzerland”, “Cantons”,
“Districts”, “Communes”, “Other spatial divisions” or “International”.prodima
: by specific BFS themes using a unique prodima number.inquiry
: by inquiry number.institution
: by institution.publishing_year_start
: by publishing year start.publishing_year_end
: by publishing year end.order_nr
: by BFS Number (FSO number).limit
: limit of query results (API limit seems to be 350)article_model_group
: article model grouparticle_model
: article modelNote that English (“en”) and Italian (“it”) data catalogs offer a
limited list of datasets. For the full list please get the French (“fr”)
or German (“de”) data catalogs (see language_available
column).
To return all the catalog metadata in the raw (uncleaned) structure, you
can add return_raw = TRUE
:
catalog_raw <- bfs_get_catalog_data(
language = "en",
extended_search = "student",
return_raw = TRUE
)
catalog_raw
## # A tibble: 4 × 5
## ids$uuid $contentId bfs$embargo description$titles$m…¹ shop$orderNr links
## <chr> <int> <chr> <chr> <chr> <lis>
## 1 9cb3291f-425… 2301224 2024-03-28… University of applied… px-x-150204… <df>
## 2 8f65763b-907… 2301215 2024-03-28… University of applied… px-x-150204… <df>
## 3 4fd81856-d35… 2301207 2024-03-28… University students b… px-x-150204… <df>
## 4 7d7f1b9e-0c6… 2301195 2024-03-28… University students b… px-x-150204… <df>
## # ℹ abbreviated name: ¹description$titles$main
## # ℹ 14 more variables: ids$gnp <chr>, $damId <int>, $languageCopyId <int>,
## # bfs$lifecycle <df[,4]>, $lifecycleGroup <chr>, $provisional <lgl>,
## # $articleModel <df[,4]>, $articleModelGroup <df[,4]>,
## # description$categorization <df[,13]>, $bibliography <df[,1]>,
## # $shortSummary <df[,2]>, $language <chr>, $abstractShort <chr>,
## # shop$stock <lgl>
The data catalog in a raw structure returns a data.frame containing
nested data.frames in some columns. Here an example to get the
description
nested data.frame as a tibble:
library(dplyr)
as_tibble(catalog_raw$description)
## # A tibble: 4 × 6
## titles$main categorization$colle…¹ bibliography$period shortSummary$html
## <chr> <list> <chr> <chr>
## 1 University of ap… <df [2 × 4]> 1997-2023 This dataset pre…
## 2 University of ap… <df [2 × 4]> 1997-2023 This dataset pre…
## 3 University stude… <df [2 × 4]> 1990-2023 This dataset pre…
## 4 University stude… <df [2 × 4]> 1980-2023 This dataset pre…
## # ℹ abbreviated name: ¹categorization$collection
## # ℹ 15 more variables: categorization$prodima <list>, $inquiry <list>,
## # $spatialdivision <list>, $classification <list>, $institution <list>,
## # $publisher <list>, $tags <list>, $dataSource <list>, $copyrights <list>,
## # $termsOfUse <list>, $serie <list>, $periodicity <list>,
## # shortSummary$raw <chr>, language <chr>, abstractShort <chr>
As the API limit is 350 results, you can get the full data catalog by
looping on specific parameters. For example, you can loop over all
prodima
numbers (equivalent to BFS themes):
# themes_names <- c("Statistical basis and overviews 00", "Population 01", "Territory and environment 02", "Work and income 03", "National economy 04", "Prices 05", "Industry and services 06", "Agriculture and forestry 07", "Energy 08", "Construction and housing 09", "Tourism 10", "Mobility and transport 11", "Money, banks and insurance 12", "Social security 13", "Health 14", "Education and science 15", "Culture, media, information society, sports 16", "Politics 17", "General Government and finance 18", "Crime and criminal justice 19", "Economic and social situation of the population 20", "Sustainable development, regional and international disparities 21")
themes_prodima <- c(900001, 900010, 900035, 900051, 900075, 900084, 900092, 900104, 900127, 900140, 900160, 900169, 900191, 900198, 900210, 900212, 900214, 900226, 900239, 900257, 900269, 900276)
library(purrr)
catalog_all <- purrr::pmap_dfr(
.l = list(language = "de", prodima = themes_prodima),
.f = bfs_get_catalog_data,
)
catalog_all
## # A tibble: 760 × 8
## title language number_bfs number_asset publication_date url_px
## <chr> <chr> <chr> <chr> <date> <chr>
## 1 Heiraten und Heirat… de px-x-0102… 32506838 2024-09-26 https…
## 2 Lebendgeburten nach… de px-x-0102… 32506840 2024-09-26 https…
## 3 Scheidungen und Sch… de px-x-0102… 32506841 2024-09-26 https…
## 4 Todesfälle nach Mon… de px-x-0102… 32506839 2024-09-26 https…
## 5 Männliche Vornamen … de px-x-0104… 32187356 2024-08-23 https…
## 6 Weibliche Vornamen … de px-x-0104… 32187357 2024-08-23 https…
## 7 Auswanderung der st… de px-x-0103… 32208056 2024-08-22 https…
## 8 Auswanderung der st… de px-x-0103… 32208055 2024-08-22 https…
## 9 Auswanderung der st… de px-x-0103… 32208061 2024-08-22 https…
## 10 Auswanderung der st… de px-x-0103… 32208057 2024-08-22 https…
## # ℹ 750 more rows
## # ℹ 2 more variables: url_structure_json <chr>, damId <int>
# to not overload the server, please save the data frame locally
# readr::write_csv(catalog_all, "catalog_all.csv")
# catalog_all <- readr::read_csv("catalog_all.csv")
Please use this loop moderately to not overload BFS server unnecessarily (just run it when needed and save the result locally).
The function bfs_get_data()
allows you to download any dataset from
the BFS
catalog
(equivalent to selecting “data” in the “Article Type” dropdown of the
BFS website) using its BFS number (FSO number).
Using the number_bfs
argument (FSO number), you can get BFS data in a
given language (“en”, “de”, “fr” or “it”) from the official PXWeb API of
the Swiss Federal Statistical Office.
#catalog_student$number_bfs[1] # px-x-1502040100_131
bfs_get_data(number_bfs = "px-x-1502040100_131", language = "en")
## # A tibble: 18,480 × 5
## Year `ISCED Field` Sex `Level of study` `University students`
## <chr> <chr> <chr> <chr> <dbl>
## 1 1980/81 Education science Male First university degr… 545
## 2 1980/81 Education science Male Bachelor 0
## 3 1980/81 Education science Male Master 0
## 4 1980/81 Education science Male Doctorate 93
## 5 1980/81 Education science Male Further education, ad… 13
## 6 1980/81 Education science Female First university degr… 946
## 7 1980/81 Education science Female Bachelor 0
## 8 1980/81 Education science Female Master 0
## 9 1980/81 Education science Female Doctorate 70
## 10 1980/81 Education science Female Further education, ad… 52
## # ℹ 18,470 more rows
When running the bfs_get_data()
function you may get the following
error message (issue #7).
Error in pxweb_advanced_get(url = url, query = query, verbose = verbose) :
Too Many Requests (RFC 6585) (HTTP 429).
This could happen because you ran too many times a bfs_get_*()
function (API config is
here). A solution is
to wait a few seconds before running the next bfs_get_*()
function.
You can add a delay in your R code using the delay
argument.
bfs_get_data(
number_bfs = "px-x-1502040100_131",
language = "en",
delay = 10
)
If the error message remains, it could be because you are querying a
very large BFS dataset. Two workarounds exist: a) download the BFS file
using bfs_download_asset()
to read it locally or b) query only
specific elements of the data to reduce the API call (see next section).
Here an example using the bfs_download_asset()
function:
BFS::bfs_download_asset(
number_bfs = "px-x-1502040100_131", #number_asset also possible
destfile = "px-x-1502040100_131.px"
)
library(pxR) # install.packages("pxR")
large_dataset <- pxR::read.px(filename = "px-x-1502040100_131.px") |>
as.data.frame()
Note that reading a PX file using pxR::read.px()
gives access only to
the German version.
First you want to get the metadata of your dataset, i.e. the variables
(code
and text
) and dimensions (values
and valueTexts
). For
example:
metadata <- bfs_get_metadata(number_bfs = "px-x-1502040100_131", language = "en")
# tidy metadata
library(dplyr)
library(tidyr) # for unnest_longer
metadata_tidy <- metadata |>
unnest_longer(c(values, valueTexts))
metadata_tidy
## # A tibble: 92 × 7
## code text values valueTexts time elimination
## <chr> <chr> <chr> <chr> <lgl> <lgl>
## 1 Jahr Year 0 1980/81 TRUE NA
## 2 Jahr Year 1 1981/82 TRUE NA
## 3 Jahr Year 2 1982/83 TRUE NA
## 4 Jahr Year 3 1983/84 TRUE NA
## 5 Jahr Year 4 1984/85 TRUE NA
## 6 Jahr Year 5 1985/86 TRUE NA
## 7 Jahr Year 6 1986/87 TRUE NA
## 8 Jahr Year 7 1987/88 TRUE NA
## 9 Jahr Year 8 1988/89 TRUE NA
## 10 Jahr Year 9 1989/90 TRUE NA
## # ℹ 82 more rows
## # ℹ 1 more variable: title <chr>
Then you can filter the dimensions you want to query using the text
and valueTexts
variables and build the query dimension object with the
code
and values
variables.
# select dimensions
dim1 <- metadata_tidy |>
filter(text == "Year" & valueTexts %in% c("2020/21", "2021/22"))
dim2 <- metadata_tidy |>
filter(text == "Level of study" & valueTexts %in% c("Master", "Doctorate"))
dim3 <- metadata_tidy |>
filter(text == "ISCED Field" & valueTexts %in% c("Education science"))
dim4 <- metadata_tidy |>
filter(text == "Sex") # all valueTexts dimensions
# build dimensions list object
dimensions <- list(
dim1$values,
dim2$values,
dim3$values,
dim4$values
)
names(dimensions) <- c(
unique(dim1$code),
unique(dim2$code),
unique(dim3$code),
unique(dim4$code)
)
dimensions
## $Jahr
## [1] "40" "41"
##
## $Studienstufe
## [1] "2" "3"
##
## $`ISCED Fach`
## [1] "0"
##
## $Geschlecht
## [1] "0" "1"
Finally you can query BFS data with specific dimensions.
BFS::bfs_get_data(
number_bfs = "px-x-1502040100_131",
language = "en",
query = dimensions
)
## # A tibble: 8 × 5
## Year `ISCED Field` Sex `Level of study` `University students`
## <chr> <chr> <chr> <chr> <dbl>
## 1 2020/21 Education science Male Master 151
## 2 2020/21 Education science Male Doctorate 121
## 3 2020/21 Education science Female Master 555
## 4 2020/21 Education science Female Doctorate 306
## 5 2021/22 Education science Male Master 143
## 6 2021/22 Education science Male Doctorate 115
## 7 2021/22 Education science Female Master 599
## 8 2021/22 Education science Female Doctorate 318
A lot of datasets are not accessible through the official PXWeb API.
They are listed in the data
catalog
as “tables” in the “Article Type” dropdown of the BFS website. You can
search for specific tables using bfs_get_catalog_tables()
.
catalog_tables_en_students <- bfs_get_catalog_tables(language = "en", extended_search = "students")
catalog_tables_en_students
## # A tibble: 5 × 5
## title language number_asset publication_date order_nr
## <chr> <chr> <chr> <date> <chr>
## 1 Students at universities and … en 31826381 2024-05-01 ts-x-15…
## 2 Students at universities of a… en 31826380 2024-05-01 ts-x-15…
## 3 Students at universities and … en 31185431 2024-03-28 su-e-15…
## 4 Students at universities of a… en 31185438 2024-03-28 su-e-15…
## 5 Students at universities of t… en 31185427 2024-03-28 su-e-15…
Most of the BFS tables are Excel or CSV files. You can download an table
with bfs_download_asset()
using the number asset
.
library(dplyr)
tables_asset_number_students <- catalog_tables_en_students |>
dplyr::filter(title == "Students at universities and institutes of technology: Basistables") |>
dplyr::pull(number_asset)
file_path <- BFS::bfs_download_asset(
number_asset = tables_asset_number_students,
destfile = "su-e-15.02.04.01.xlsx"
)
To return all the catalog metadata in the raw (uncleaned) structure, you
can add return_raw = TRUE
:
catalog_tables_raw <- bfs_get_catalog_tables(
language = "en",
extended_search = "student",
return_raw = TRUE
)
catalog_tables_raw
## # A tibble: 6 × 5
## ids$uuid $contentId bfs$embargo description$titles$m…¹ shop$orderNr links
## <chr> <int> <chr> <chr> <chr> <lis>
## 1 7a604831-d27… 20044168 2024-05-01… Students at universit… ts-x-15.02.… <df>
## 2 ac4e3021-db4… 20044200 2024-05-01… Students at universit… ts-x-15.02.… <df>
## 3 5e328530-77f… 528179 2024-03-28… Students at universit… su-e-15.02.… <df>
## 4 6e27402b-8dc… 528173 2024-03-28… Students at universit… su-e-15.02.… <df>
## 5 1e86c267-5f9… 528176 2024-03-28… Students at universit… su-e-15.02.… <df>
## 6 ab482495-fc9… 14876281 2023-11-02… Student mobility with… su-e-15.02.… <df>
## # ℹ abbreviated name: ¹description$titles$main
## # ℹ 14 more variables: ids$gnp <chr>, $damId <int>, $languageCopyId <int>,
## # bfs$lifecycle <df[,4]>, $lifecycleGroup <chr>, $provisional <lgl>,
## # $articleModel <df[,4]>, $articleModelGroup <df[,4]>,
## # description$categorization <df[,13]>, $bibliography <df[,1]>,
## # $shortSummary <df[,2]>, $language <chr>, $abstractShort <chr>,
## # shop$stock <lgl>
The data catalog in a raw structure returns a data.frame containing
nested data.frames in some columns. Here an example to get the
description
nested data.frame as a tibble:
library(dplyr)
as_tibble(catalog_tables_raw$description)
## # A tibble: 6 × 6
## titles$main categorization$colle…¹ bibliography$period shortSummary$html
## <chr> <list> <chr> <chr>
## 1 Students at univ… <df [3 × 4]> 1980-2023 <p>Descriptions …
## 2 Students at univ… <df [3 × 4]> 2000-2023 <p>Descriptions …
## 3 Students at univ… <df [3 × 4]> 1990-2023 <NA>
## 4 Students at univ… <df [3 × 4]> 1997-2023 <NA>
## 5 Students at univ… <df [2 × 4]> 2005-2023 <NA>
## 6 Student mobility… <df [2 × 4]> 2021 <NA>
## # ℹ abbreviated name: ¹categorization$collection
## # ℹ 15 more variables: categorization$prodima <list>, $inquiry <list>,
## # $spatialdivision <list>, $classification <list>, $institution <list>,
## # $publisher <list>, $tags <list>, $dataSource <list>, $copyrights <list>,
## # $termsOfUse <list>, $serie <list>, $periodicity <list>,
## # shortSummary$raw <chr>, language <chr>, abstractShort <chr>
Display geo-information catalog of the Swiss Official STAC API using
bfs_get_catalog_geodata()
.
catalog_geodata <- bfs_get_catalog_geodata(include_metadata = TRUE)
catalog_geodata
## # A tibble: 281 × 12
## collection_id type href title description created updated crs license
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 ch.are.agglomera… API http… Citi… "The list … 2021-1… 2023-0… http… propri…
## 2 ch.are.alpenkonv… API http… Alpi… "The perim… 2021-1… 2022-0… http… propri…
## 3 ch.are.belastung… API http… Load… "Passenger… 2021-1… 2022-0… http… propri…
## 4 ch.are.belastung… API http… Load… "Passenger… 2021-1… 2022-0… http… propri…
## 5 ch.are.belastung… API http… Load… "Vehicles … 2021-1… 2022-0… http… propri…
## 6 ch.are.belastung… API http… Load… "Vehicles … 2021-1… 2022-0… http… propri…
## 7 ch.are.erreichba… API http… Acce… "Accessibi… 2021-1… 2022-0… http… propri…
## 8 ch.are.erreichba… API http… Acce… "Accessibi… 2021-1… 2022-0… http… propri…
## 9 ch.are.gemeindet… API http… Typo… "The typol… 2021-1… 2022-0… http… propri…
## 10 ch.are.gueteklas… API http… Publ… "The publi… 2021-1… 2023-0… http… propri…
## # ℹ 271 more rows
## # ℹ 3 more variables: provider_name <chr>, bbox <list>, inverval <list>
For example you can get information about the dataset “Generalised borders G1 and area with urban character”.
library(dplyr)
geodata_g1 <- catalog_geodata |>
filter(title == "Generalised borders G1 and area with urban character")
geodata_g1
## # A tibble: 1 × 12
## collection_id type href title description created updated crs license
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 ch.bfs.generalisi… API http… Gene… Administra… 2022-0… 2023-0… http… propri…
## # ℹ 3 more variables: provider_name <chr>, bbox <list>, inverval <list>
Download dataset by collection id with bfs_download_geodata()
and
unzip file if needed.
# Access Generalised borders G1 and area with urban character
borders_g1_path <- bfs_download_geodata(
collection_id = "ch.bfs.generalisierte-grenzen_agglomerationen_g1",
output_dir = tempdir() # temporary directory
)
# you may need to unzip the file
unzip(borders_g1_path[4], exdir = "borders_G1")
By default, the files are downloaded in a temporary directory. You can
specify the folder where saving the files using the output_dir
argument.
Some layers are accessible using WMS (Web Map Service):
library(leaflet)
leaflet() %>%
setView(lng = 8, lat = 46.8, zoom = 8) %>%
addWMSTiles(
baseUrl = "https://wms.geo.admin.ch/?",
layers = "ch.bfs.generalisierte-grenzen_agglomerationen_g2",
options = WMSTileOptions(format = "image/png", transparent = TRUE),
attribution = "Generalised borders G1 © 2024 BFS")
You can get cartographic base
maps
from the ThemaKart project using bfs_get_base_maps()
. The list of
available geometries in the official
documentation.
The default arguments of bfs_get_base_maps()
can be change to access
specific files:
# default arguments
bfs_get_base_maps(
geom = NULL,
category = "gf", # "gf" for total area (i.e. "Gesamtflaeche")
type = "Poly",
date = NULL,
most_recent = TRUE, #get most recent file by default
format = "shp",
asset_number = "24025646" #change to get older ThemaKart data
)
A typical base maps ThemaKart file looks like this:
# list of geometry names: https://www.bfs.admin.ch/asset/en/24025645
switzerland_sf <- bfs_get_base_maps(geom = "suis")
communes_sf <- bfs_get_base_maps(geom = "polg", date = "20230101")
districts_sf <- bfs_get_base_maps(geom = "bezk")
cantons_sf <- bfs_get_base_maps(geom = "kant")
cantons_capitals_sf <- bfs_get_base_maps(geom = "stkt", type = "Pnts", category = "kk")
lakes_sf <- bfs_get_base_maps(geom = "seen", category = "11")
library(ggplot2)
ggplot() +
geom_sf(data = communes_sf, fill = "snow", color = "grey45") +
geom_sf(data = lakes_sf, fill = "lightblue2", color = "black") +
geom_sf(data = districts_sf, fill = "transparent", color = "grey65") +
geom_sf(data = cantons_sf, fill = "transparent", color = "black") +
geom_sf(data = cantons_capitals_sf, shape = 18, size = 3) +
theme_minimal() +
theme(axis.text = element_blank()) +
labs(caption = "Source: ThemaKart, © BFS")
You can create an interactive map easily with the mapview R package.
library(mapview)
BFS::bfs_get_base_maps(geom = "bezk") |>
mapview(zcol = "name", legend = FALSE)
The package also contains the official Swiss official commune registers for different administrative levels:
register_gde
register_gde_other
register_bzn
register_kt
register_kt_seeanteile
register_dic
# commune register data
BFS::register_gde
## # A tibble: 2,136 × 8
## GDEKT GDEBZNR GDENR GDENAME GDENAMK GDEBZNA GDEKTNA GDEMUTDAT
## <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 ZH 101 1 Aeugst am Albis Aeugst am A… Bezirk… Zürich 1976-11-…
## 2 ZH 101 2 Affoltern am Albis Affoltern a… Bezirk… Zürich 1848-09-…
## 3 ZH 101 3 Bonstetten Bonstetten Bezirk… Zürich 1848-09-…
## 4 ZH 101 4 Hausen am Albis Hausen am A… Bezirk… Zürich 1911-01-…
## 5 ZH 101 5 Hedingen Hedingen Bezirk… Zürich 1848-09-…
## 6 ZH 101 6 Kappel am Albis Kappel am A… Bezirk… Zürich 1911-01-…
## 7 ZH 101 7 Knonau Knonau Bezirk… Zürich 1848-09-…
## 8 ZH 101 8 Maschwanden Maschwanden Bezirk… Zürich 1848-09-…
## 9 ZH 101 9 Mettmenstetten Mettmenstet… Bezirk… Zürich 1848-09-…
## 10 ZH 101 10 Obfelden Obfelden Bezirk… Zürich 1848-09-…
## # ℹ 2,126 more rows
You can use registers to ease geodata analysis.
library(dplyr)
library(sf)
communes_sf <- bfs_get_base_maps(geom = "polg", date = "20230101")
communes_ge <- communes_sf |>
inner_join(BFS::register_gde |>
filter(GDEKTNA == "Genève"),
by = c("id" = "GDENR"))
bbox_ge <- sf::st_bbox(communes_ge)
lake_leman <- bfs_get_base_maps(geom = "seen", category = "11") |>
filter(name == "Lac Léman")
communes_ge |>
ggplot() +
geom_sf(data = lake_leman, fill = "lightblue2", color = "grey65") +
geom_sf(fill = "snow", color = "grey65") +
geom_sf_text(aes(label = name), size = 3, check_overlap = T) +
# bounding box
coord_sf(
xlim = c(bbox_ge$xmin, bbox_ge$xmax),
ylim = c(bbox_ge$ymin, bbox_ge$ymax)
) +
theme_minimal() +
theme(axis.text = element_blank()) +
labs(title = "Communes du canton de Genève",
x = NULL, y = NULL,
caption = "Source: ThemaKart, © BFS")
Under the hood, this package is using the <a href="https://ropengov.github.io/pxweb/index.html" target="_blank">pxweb package to query the Swiss Federal Statistical Office PXWEB API. PXWEB is an API structure developed by Statistics Sweden and other national statistical institutions (NSI) to disseminate public statistics in a structured way. To query the Geo Admin STAC API, this package is using the rstac package. STAC is a specification of files and web services used to describe geospatial information assets.
You can clean the column names of the datasets automatically using
janitor::clean_names()
by adding the argument clean_names = TRUE
in
the bfs_get_data()
function.
This package is in no way officially related to or endorsed by the Swiss Federal Statistical Office (BFS).
Any contribution is strongly appreciated. Feel free to report a bug, ask any question or make a pull request for any remaining issue.