Open sandervh14 opened 1 month ago
I've changed title parsing so newlines aren't lost
also updated frontend to preserve newlines in titles (white-space: pre-wrap)
@sandervh14 can you provide examples for bullets 2 & 3 so we know what to focus on exactly?
Screenshot for newlines in titles (this is a newline I added myself for testing)
Nice. 🔥
I updated my issue description above (https://github.com/transparentdemocracy/voting-data/issues/50#issue-2321798181).
New ones:
[ ] quite a few recent plenaries with "Er waren geen stemmingen", is this correct? [ ] motion on 28/3/324 with number 479 (unsupported number of motions so far) has no document reference and therefore also no summary.
I've checked plenaries 308, 307, 306, 305, 303 and 301 which didn't have any votes according to the website. Of these, only 301 actually had votes. I'll add a test case for this.
Extraction for 301 fails because the plenary doesn't open with a level 1 header (h1 or css class) and the extraction algorithm expects this.
I just tested our website locally. I think these are important changes we need to make to make the website experience better:
don't remove newlines between lines of motion title
description below the motion in the front-end: newlines are gone. For example, we know that 15.04, 15.05, etc: these are always on a new line. Keeping the original text formatting would be great, making the text better readable, not one big lump of text. I don't know if this means the frontend is not using the description_tags_nl that were made at some point, instead of description_nl. Or there's another problem.
improve extraction of Dutch vs French description texts which are displayed below the motions on the website. The original text behind these descriptions contains indications of which tag is French, which is Dutch, although we didn't find it be consistent all the time. Either we continue that path and just do best-effort, or we go to separate French / Dutch texts using the summarization that Karel is working on.
"data freely available with the MIT license" says the bottom of the website, we don't have a license set yet on our data repo. I'm worried about the "modification" permission on the data that MIT license brings. We don't want to open a gigantic door to a convenient source for creating a misinformation website that is copying ours, with modified data.
reduce errors thrown in the backend on start-up, like the following (and many more):
This is what leads to only so few plenaries being listed currently on the website. And therefore also fewer motions on the website than we have extracted already to plenaries.json. To be investigated why the backend is so unhappy about the plenaries.json we deliver.