Closed sandervh14 closed 2 months ago
JSON is always UTF8 so there's no need to use \unnnn for encoding accented letters.
E.g. in plenary 100 -> Cornet C\u00e9cile
could just be Cornet Cécile
.
I personally wouldn't include all the generated files in the repository, those are build artifacts and they could blow up the size of the repository. But that's a choice, up to you. Of course having a few samples as test fixtures is perfectly fine.
maybe add 'plenaries' in the directory structure? (so data/output/plenaries/json). Could get crowded otherwise
A PR to provide draft JSON dumps of the plenaries, which could be used in a prototype of the front-end.
I'd like to still discuss with some people about what the ideal domain model object relations could be. I would actually like to ditch the old Motion attributes and go to a more "object oriented" data model, using the Vote class. But I kept it now because we are still keeping the pdf extractor alive.
This is a draft, I still need to finish up the unit tests.
I chose to no longer store PDFs in the input folder, only the HTML voting reports. We can consider that our back-up copy and can refer to the PDFs on the dekamer.be server.