Open sandervh14 opened 6 months ago
I've added a command to download the referenced documents. This requires that you first produce a 'plenaries.json'. So assuming you've already downloaded the plenaries html files:
td-plenaries-json
td-download-referenced-documents
There are some references (e.g MOT nr 483 in plenary 298) that we currently don't parse. Let's keep this ticket open until we figure out what those are even referencing.
Scrape metadata, including the urls of documents so we can fetch them, for each of the document references we extracted from processing the plenary reports.
Example metadata: https://www.dekamer.be/kvvcr/showpage.cfm?section=/flwb&language=nl&cfm=/site/wwwcfm/flwb/flwbn.cfm?legislat=55&dossierID=3495
Was found by entering document reference 3495 in the search bar on the page providing the full overview of documents: https://www.dekamer.be/kvvcr/showpage.cfm?section=/flwb&language=nl&cfm=ListDocument.cfm.
But looking at the first URL mentioned above, we will be able to scrape the metadata and documents simply by filling in the legislature and document reference in the first URL, the second URL we won't need for scraping.