welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Check specialized start/end dates for MPs #355

Closed MansMeg closed 7 months ago

MansMeg commented 12 months ago

Problem A start/end date in the bio books sometimes only refers to a year (e.g. 1964), which means the official start/end of that riksdag year. If a specific start/end dates is stated (e.g. 1964-10-23), that means that the MP started/ended in the middle of a riksdag year. The problem is that we don't know how accurate start/end dates in wikdata are, that is if an MP started/ended according to the riksdag year or in the middle of the year.

During the weekend, I went through the first bio book of the bicameral riksdag and looked up every MP that had a specific start/end date and checked if this date was correct on wikidata. If it was not, I added the specific date. As suspected, quite many MPs missed these specific start/end dates. It “only” took 3–4 hours to go through the whole first bio book. A bonus with this work was that I also found out a couple of MPs that missed i-ort, which I added.

Solution We should review the last four volumes of the biobooks and check specialized dates.

If we do this, it would be great to also list these people in a separate file for quality control/unit testing. I suggest a file with as following:

wiki_id;specific_date,date_type
Qxxxx,1899-03-12,start_date
Qyyyy,1932-09-11,end_date

Then we can check that these are not changed at Wikidata if we go through the manual part anyway.

More discussion can be found in issue #342 .

We should add the files as a unit test

fredrik1984 commented 12 months ago

I will create a file with those columns and add MPs that lack specific dates to that file.

I will also contact Lotta and Mattias and ask them to work on bio book 3–5. I will also ask Lotta to curate the list of start/end dates of riksdag years from 1867 to today.

MansMeg commented 12 months ago

The opposite. Only add those with specific dates (since those are much more rare).

fredrik1984 commented 12 months ago

Ah, so I should add ALL MPs that have specific dates on wikidata, including those that I add in this work? I will see how much extra work this will take though. Maybe it is easy.

MansMeg commented 12 months ago

Its just that the stuff that we manually check, we want to keep as unit test because we are sure this is correct. We can discuss the details friday.

BobBorges commented 12 months ago

I will also ask Lotta to curate the list of start/end dates of riksdag years from 1867 to today.

@fredrik1984 I've made such a list from the wikipedia page & plan to open a PR with it tomorrow. Perhaps it will be less work to just criticize that list and propose fixes than to start from 0.

MansMeg commented 12 months ago

Excellent!

fredrik1984 commented 12 months ago

@MansMeg – ok, so just MP specific start/end dates that are currently missed on wikidata. That was what I thought.

@BobBorges – yes, that sounds very reasonable! Let me know when that is done and then Lotta can start on that.

fredrik1984 commented 10 months ago

Here is a Google Sheet with all MPs from the bio books that have at least one specific start/end date that was missing in Wikidata but has now been corrected. Currently, Fredrik and Mattias have added info from bio books 1–4. When @Lottabrorsson has done the last bio book @BobBorges can do an update.

https://docs.google.com/spreadsheets/d/1lm2WR-wc-FWsQV0tbjYfWIng6LjOnvE2xi5R8zwny_U/edit#gid=0

fredrik1984 commented 9 months ago

@BobBorges – is this issue ready to be closed?

MansMeg commented 9 months ago

I think we should add a unit test with the files from you, Lotta and Mattias as a final thing before we close this.

BobBorges commented 9 months ago

yes, not ready to close