Closed sandervh14 closed 2 months ago
The plenaries have both pdf and html. Html might be easier to convert to text without pdf typesetting artefacts.
The plenaries have both pdf and html. Html might be easier to convert to text without pdf typesetting artefacts.
True, good spot! I've noticed it too in the past week, see https://github.com/transparentdemocracy/voting-data/issues/1. I didn't know it when I started building the extraction.
I was thinking of continuing to build the back-end and front-end for a voting test prototype and only then coming back to make sure more and better votes would get extracted.
But we can discuss that, or contributions could allow fixing both at the same time.🙂 Depends on which priorities people see. What do you think?
I see a lot of advantages in working with plain text:
Overall, I would recommend converting to PDF asap in a separate command so you can make things nice and fast. I've made some progress on my html based implementation to get the votes, I'd like to keep working on it, but depending on priorities I can switch to different tasks
Perfectly fine! Thanks for the work!
@karel1980 took care of this when he submitted PR #9, which is merged now.
For example: