welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Map "normal" parliament periods to real start and end dates #342

Closed BobBorges closed 11 months ago

BobBorges commented 1 year ago

In various meta files, we have incomplete information, e.g. about the start and end date of an MP's mandate, only that they were mandated at such and such a year. We should create a meta file which maps normal mandate periods to actual dates so the dates can be more effectively inferred when necessary.

Perhaps sth like this as a starting point.

fredrik1984 commented 12 months ago

I have thought about this over the weekend, and I have a proposal for how we can achieve at least 90 % coverage number of MPs/protocol

Problem: A start/end date in the bio books sometimes only refers to a year (e.g. 1964), which means the official start/end of that riksdag year. If a specific start/end dates is stated (e.g. 1964-10-23), that means that the MP started/ended in the middle of a riksdag year. The problem is that we don't know how accurate start/end dates in wikdata are, that is if an MP started/ended according to the riksdag year or in the middle of the year.

During the weekend, I went through the first bio book of the bicameral riksdag and looked up every MP that had a specific start/end date and checked if this date was correct on wikidata. If it was not, I added the specific date. As suspected, quite many MPs missed these specific start/end dates. It “only” took 3–4 hours to go through the whole first bio book. A bonus with this work was that I also found out a couple of MPs that missed i-ort, which I also added.

Proposal 1: I suggest that we (me, Lotta, and Mattias) do this for the rest of the bio books for the bicameral riksdag (there is no need to do it for the unicameral riksdag since we know that we have good coverage there). This will most likely improve the coverage of MPs/protocol a lot.

Proposal 2: After the manual work in Proposal 1 we know how many start/end dates only refer to a year (e.i. official start/end of a riksdag year). Then we can automatically assign all start years respective all end years with the official start/end dates of the riksdag year. To do so, we can use this list, after we have done a few adjustments that Lotta found out, https://sv.wikipedia.org/wiki/Lista_%C3%B6ver_svenska_riksdagar.

These two steps should hopefully generate a higher coverage than 90 % over the whole period.

What do you say about this @MansMeg @BobBorges @ninpnin?

MansMeg commented 12 months ago

I think this is a very good idea. Two comments: 1) This is work (checking biobooks and updating wikidata) that anyone can do so I think it is something @liamtabib could do. I want to save your expertise to things that are more difficult. That said, I think it is a good idea to do (if you currently dont have anything else to do). 2) I thought Bob updated all iort on wikidata? If he did, this means that these iorts also are probably missing in the unit test we have. Hence I would create a new issue with these missing iorts since they should be part of the unit test as well.

This are my thoughts.

MansMeg commented 12 months ago

If we take this path and do this, I think we should open a separate issue.

fredrik1984 commented 12 months ago

Yes, I think this should be a separate issue. Could you create one @MansMeg?

We can see how much Liam has on his plate at the moment. Otherwise, I thought I could take 30 min each evening or so to continue with bio book 2 in the coming week. Then I think Lotta and Mattias could do book 3–5.

I was also a little bit surprised to find i-orts missing. @BobBorges do you have an idea why this is so? I guess we still have the MPs in our MP database, right?

BobBorges commented 12 months ago

@fredrik1984 please post, or send me the wiki IDs of those people with Missing iort. It shouldn't be that there are missing iort. Without digging in, two possibilities come to mind:

Regarding the "standard" start-end dates, I'll open a PR tomorrow with a metadata file -- it's clear that the list on wikipedia isn't perfect, so we can scrutinize it until we're happy.

Re getting to 90%. I also have a bit of work to do with that code -- @MansMeg Fredrik knows about this already, but there's a bug in the way I handled dates, so for instance in the case of that graph I showed of 1970, we have some 220 MPs in our metadata for that period. I'd suggest: give me a day or two to work that out before we decide to spend hours of manual labor. It might be good to do anyway, but maybe less urgently than we think.

MansMeg commented 12 months ago

I think it is a good task to do. I create an issue for it later today or tomorrow. Then we can wait until friday so @BobBorges can take a pass on it (if you are back) to see how urgent it is.

I agree with Bob, it should not happen that we miss iort from the books. So please file this as a separate issue that we can follow up. I dont think it is urgent to fix. But we should add it to the backlogg.

MansMeg commented 12 months ago

Now added the first task of manual checking in the biobooks as a separate issue.

fredrik1984 commented 12 months ago

I didn’t save the links with the MPs that missed i-ort. When I continue with bio book 2 I will do that.

The reasons that Bob posted seem reasonable, although I am not the best person to judge here. Just a thought: we have issues with multiple/double MP wikidata posts, like #276 #302 #316.

Could this also be a reason? That I have stumbled upon the double MP wikidata posts that lacked i-ort? It seems a bit strange but you @Bob and @Måns know this better.

MansMeg commented 12 months ago

That might be a reason. Indeed. Next time, if you see anyone missing anniort. Just open up an issue.

BobBorges commented 11 months ago

Missing iorter were apparently not missing, but not always appearing in the AKA cell at the top of wikidata pages.

BobBorges commented 11 months ago

Open PR #356 -- closing issue