Tools used for collecting SFS (Svensk Författningssamling) from Riksdagens öppna data.
Binary to run for collecting SFS.
Uses webcrawler
and opendata-spider
.
Takes roughly 1 hour to fetch all SFS data.
Lives in opendata-spider
.
Uses swegov-opendata
.
Contains concrete spider for collecting SFS.
This spider spawns urls that searches for documents of type SFS
in 20 years spans, using the data.riksdagen.se/dokumentlista
path.
These lists are scraped for dok_id
to scrape documents and nasta_sida
to scrape next page in the dokumentlista
.
All fetched pages are stored to disk in JSON-format, except for the pages with html fragments, that are stored as-is. The documents are grouped by year.
This spider handles the following inconsistencies in the api.
data.riksdagen.se/dokument/<dok_id>
is supposed to get the document with dok_id
.
html
field of a document is returneddata.riksdagen.se/dokumentstatus/<dok_id>
is neededUses swegov-opendata
.
Build corpus files for processing with sparv.
Data model for the documents and document lists from riksdagens öppna data with serde
serialization and deserialization.
Lives in webcrawler
.
Generic web crawler that defines an interface for spiders.
The spiders work in 2 steps,
The MSRV (Minimum Supported Rust Version) is fixed for a given minor (1.x) version. However it can be increased when bumping minor versions, i.e. going from 1.0 to 1.1 allows us to increase the MSRV. Users unable to increase their Rust version can use an older minor version instead. Below is a list of swegov-opendata-rs versions and their MSRV:
Note however that swegov-opendata-rs also has dependencies, which might have different MSRV policies. We try to stick to the above policy when updating dependencies, but this is not always possible.