welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Correct overlapping dates in wikidata #231

Open MansMeg opened 1 year ago

MansMeg commented 1 year ago

We have identified some speakers in the parliament that have overlapping dates in wikidata.

The easiest is probably to fix this manually at wikidata as follows:

  1. Look up the individual at wikidata and in the tvåkammarbook.
  2. Check the correct date and then add this as a reference to the book in wikidata

@TomasSkotare could you list the wikidata-ids that we know have overlapping parties. It should be roughly 110?

TomasSkotare commented 1 year ago

This is a (perhaps partial) list of the overlap for 0.5.1: metadata_party_affiliations_overlap.xlsx

Overlap should ignore cases where the start and end dates are the same. Several, perhaps most, of these are due to either bad precision in the end dates. No value is assumed to be "forever" - i.e. a missing start date will be "from the beginning" and a missing end date will be until "today", whenever the script is run. A year will be the Jan 1 of that year both in end and beginning, perhaps incorrectly (which can cause errors).

Both the original date and the assumed date is found in the excel sheet so hopefully such errors can be found.

Note that more overlap can be found, as overlaps with the same party is ignored for now (even though it's odd to have).

I have some improvements in mind to find further (actual, problematic) overlap, and perhaps show the overlapping duration more clearly.

MansMeg commented 1 year ago

I looked up Q5717497. Im not sure to handle that in the best possible way. It is actually a reasonable data at wikipedia/wikidata. We dont know when he formally became part of the moderate party. But for us it is sufficient to know he is NYD when in parliament.

TomasSkotare commented 1 year ago

The unreasonable part would be the lack of start/end dates; in cases where there party is the same this wouldn't be an issue. While it is possible to allow cases where we have better precision to take precedence, that is not great either...

The best option is to avoid overlap, which would require some manual input (i.e. verifying the actual dates)

MansMeg commented 1 year ago

Yes. But in the example above I dont think we can edit it in a reasonable way in wikidata. I think that example is reasonable and should probably be solved with logics in a function.

@TomasSkotare Do you have a python function to assign party affiliation based on the data? We should probably add it in the python lib and solve (some of the issues there).

We could discuss the API for that function. I guess we want something like:

party_affiliation(person_id, date, corpus_path)

that return the party affiliation for that person at that date.

TomasSkotare commented 1 year ago

Yes, we do have a python function to assign affiliation, with some rules. It already works roughly as you describe, and it works well in cases where the data is clear, the issues only pop up when we have ill-defined limits or overlap.

Regarding this, I updated and refreshed the data for 0.6.0, so it found a couple more and also shows which parties overlap and when:

metadata_party_affiliations_overlap_with_dates.xlsx The ones with dates: From 1867 to 2023 didn't have any defined limits and are assumed to be "forever".

MansMeg commented 1 year ago

Great! Could you try to add this to the pyriksdagen package?

fredrik1984 commented 2 months ago

@BobBorges are we done with this issue?

BobBorges commented 2 months ago

I have not made a test for this specifically, but .... probably.

I'll make a test for this.