transparentdemocracy / voting-data

Voting behavior data extracted from plenary reports of the Belgian federal government.
5 stars 1 forks source link

Draft: Issue #11: Serializing plenaries to JSON #13

Closed sandervh14 closed 2 months ago

sandervh14 commented 2 months ago

A PR to provide draft JSON dumps of the plenaries, which could be used in a prototype of the front-end.

I'd like to still discuss with some people about what the ideal domain model object relations could be. I would actually like to ditch the old Motion attributes and go to a more "object oriented" data model, using the Vote class. But I kept it now because we are still keeping the pdf extractor alive.

This is a draft, I still need to finish up the unit tests.

I chose to no longer store PDFs in the input folder, only the HTML voting reports. We can consider that our back-up copy and can refer to the PDFs on the dekamer.be server.

karel1980 commented 2 months ago

JSON is always UTF8 so there's no need to use \unnnn for encoding accented letters. E.g. in plenary 100 -> Cornet C\u00e9cile could just be Cornet Cécile.

karel1980 commented 2 months ago

I personally wouldn't include all the generated files in the repository, those are build artifacts and they could blow up the size of the repository. But that's a choice, up to you. Of course having a few samples as test fixtures is perfectly fine.

karel1980 commented 2 months ago

maybe add 'plenaries' in the directory structure? (so data/output/plenaries/json). Could get crowded otherwise