welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Setup unit test for MPs #338

Closed MansMeg closed 4 months ago

MansMeg commented 1 year ago

We discussed how to setup a unit test for MP quality that is good enough. We ended up in testing the following.

For all dates available in the corpus (protocol dates) we check that there are at least 90% of the true number of MPs in the database.

This should be implemented as a unit test, and then we will try to focus on where this is not the case.

BobBorges commented 11 months ago

Do you want to make a unit test well before we can make the unit test pass? Code exists for this in script form...

BobBorges commented 11 months ago

An update on the MP frequency issue: I revised code to handle dates in different formats better. In essence:

The results are closer to what we want: 98.03% within the 10% tolerance. In this case, unlike the last time around, those parliament days that fail this test fail because there are too many MPs. I'm not sure yet how to evaluate which of these strategies is closer to the truth of our coverage, but if we trust the manual work that has been done with the bio books and Wikidata, I suppose (and hope) this most recent iteration is better than the last.

Failing parliaments (in decreasing severity) are

These earlier years tend to be the ones where we have less specific info on MPs mandate period -- @fredrik1984 @Lottabrorson, do either of you know off the top of your heads if there was a lot of turnover betewen a and b or lagtima and urtima? That might explain some of the overages.

I think we should try this again when we settle on a list of 'normal' start and end dates for parliament. #356

image

fredrik1984 commented 11 months ago

Ok, this looks like an improvement! And I am sure the graph will look even better after we have done the manual work with the bio books and Wikidata! Great work @BobBorges!

MansMeg commented 11 months ago

Yes. I guess we can wait with this until @fredrik1984 and the orhers are done with their pass through the biobooks?

fredrik1984 commented 11 months ago

Yes, let's wait for that. I will probably be done with bio book 2 later this week. It might take some more week or so for Mattias (bio book 3–4) and @Lottabrorsson (bio book 5). Going through bio book 1–2, I have added several specific dates so this work will most likely improve the MPs/protocol graph.

Also, as we go through the bio books, we also look up MPs that have more than one party belonging and see if these parties are added on Wikidata #359. I must say that Sälgö has done a good job in adding parties to MPs in the 19th century! Hence, doing this and fixing the list of MPs with no parties (#349) will improve the MP database a lot!

MansMeg commented 11 months ago

I think #355 is relevant here as well.