welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Add currently missing MPs to wikidata #157

Closed MansMeg closed 2 years ago

MansMeg commented 2 years ago

There are currently MPs missing in wikidata that we need to reduce the number of unknown speakers in the parliament. As a first step we will add all the persons we need for our corpus right now. For more info, see: https://github.com/welfare-state-analytics/riksdagen-corpus/issues/149

Does this sound correct @rbbby and @ninpnin ?

MansMeg commented 2 years ago

When going through this list of most common unknowns, it is clear that there are some difficulties that need to be fixed where people definitely exist in the MP database. For example, statsminister Göran Persson, Finansminister Magdalena Andersson, statsrådet Sträng o.s.v.

https://github.com/welfare-state-analytics/riksdagen-corpus/blob/dev/input/most_common_unknowns.csv

These also make up quite a lot of introductions (of obvious reasons)

MansMeg commented 2 years ago

Also, "talmannen" is a reason for a lot of unknowns. This should also quite easily be fixed.

MansMeg commented 2 years ago

After our discussions today, @rbbby will go through the first 300 names in the list and identify the source of the errors.

Maybe add a column here what needs to be fixed to fix that line in the file? https://github.com/welfare-state-analytics/riksdagen-corpus/blob/dev/input/most_common_unknowns.csv

Then we can all help out fixing sources of the errors?

rbbby commented 2 years ago

I think there may be some bug with the ministers, will look into it. I recall correctly the list of talmän is incomplete on wikidata (definitely the third vice speakers, will look into the others). Will also start doing detective work on why some persons are not matched but need to develop some kind of tool for filtering our databases as it is very difficult now.

MansMeg commented 2 years ago

Great! Yes. This also indicate that we need a test suite for the mapping. We should be warned if we write over known mappings with unknown.

MansMeg commented 2 years ago

This has now been summarized in issue #163 instead.