welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Iort unittest #313

Closed BobBorges closed 1 year ago

BobBorges commented 1 year ago

Here's a failing unittest for the i-orter: there are 535 (of 5359) missing in the metadata.

BobBorges commented 1 year ago

TODO

--> test will pass

MansMeg commented 1 year ago

It seems like the tests include party unit tests: ”The following people are missing from the corpus metadata (party_affiliation.csv)” Is this also included here?

BobBorges commented 1 year ago

No, it's not included. I mean, I wrote all the unit tests at the same time, but the assertion is commented out. Following the minimal PR theory, it will be fixed in a separate pull request.

MansMeg commented 1 year ago

Ok. Great! But it looks like both this test and a test of duplicates are run at github actions?

BobBorges commented 1 year ago

They run, but there's assert(condition) at the end of each test that determine if it passes or fails, and the assertion is commented out for the party affiliation test, so it doesn't cause the test to fail.

BobBorges commented 1 year ago

Here are a couple issues that I already see in the known_iorter.csv.

If there are others you see in the review let me know and I will fix them all at once.

MansMeg commented 1 year ago

I started to look into this, but it is a lot to check. There is no point in checking all (I skimed through many of them). Could you list (in a CSV):

Then I can check those more in detail.

BobBorges commented 1 year ago

duplicate_iorter.csv weird_iorter.csv

iort missing in the database

There are no NAs in the iort column if that's what you mean --- i looked up MPs missing an iort in the bio books already.

MansMeg commented 1 year ago

I have now checked the two first. Cannot find any problems. The missing iort I mean are those that are in Emils file but is missing in wikidata (ie those we will add to wikidata).

BobBorges commented 1 year ago

The missing iort I mean are those that are in Emils file but is missing in wikidata (ie those we will add to wikidata).

it's here (though I haven't updated the file since our meeting, so there are a couple cases of "ort1 o ort2" but these will be addressed in the next local run of the unit test)

missing_location_specifier.csv

MansMeg commented 1 year ago

Alright!

I found the following error:

UPDATE: Alfred seem to be different persons.

BobBorges commented 1 year ago

First two are fixed -- so I'll start updating iorter to wikidata when I can.

MansMeg commented 1 year ago

Great! Looking forward!

BobBorges commented 1 year ago

How strict to we want to be about adding a source to the iort on Wikidata? I assume we want to include a source for every bit of info added to wikidata.

fredrik1984 commented 1 year ago

How strict to we want to be about adding a source to the iort on Wikidata? I assume we want to include a source for every bit of info added to wikidata.

Hm. I am not entirely sure about this, but I think we don't need to be that strict about that at the moment. It might be something for later though. Or what do you say @MansMeg?

BobBorges commented 1 year ago

so, if it's a matter of just uploading iort, I can do it in a matter of minutes. But the sources can be a small issue for some - those who have multiple entries in the bio books and those who have roles in the bicameral and unicameral period will need to be looked up once again -- I can do this, it will just take a bit longer.

MansMeg commented 1 year ago

I think we want to add a source. Otherwise, the information might be deleted from Wikidata. further down the line.

I guess the simplest is to refer to the biobook and the pages with the registered that we used. Or how much extra work would it be?

BobBorges commented 1 year ago

the information might be deleted from Wikidata

So sources. OK, It's not that much extra work.

BobBorges commented 1 year ago

I have the i-ort test passing locally, but after requerying metadata the member_of_parliament test fails.

image

The guy was ersättare in Älvsborg according to the page given in the reference.

image

If I delete Västra Götaland as an electoral district, will that solve the problem?

MansMeg commented 1 year ago

I would check what is correct in the biobooks and correct it on wikidata. Then requery and it should work?

BobBorges commented 1 year ago

@ninpnin Last review before + merge?

ninpnin commented 1 year ago

LGTM 👍