Closed MansMeg closed 2 years ago
When going through this list of most common unknowns, it is clear that there are some difficulties that need to be fixed where people definitely exist in the MP database. For example, statsminister Göran Persson, Finansminister Magdalena Andersson, statsrådet Sträng o.s.v.
https://github.com/welfare-state-analytics/riksdagen-corpus/blob/dev/input/most_common_unknowns.csv
These also make up quite a lot of introductions (of obvious reasons)
Also, "talmannen" is a reason for a lot of unknowns. This should also quite easily be fixed.
After our discussions today, @rbbby will go through the first 300 names in the list and identify the source of the errors.
Maybe add a column here what needs to be fixed to fix that line in the file? https://github.com/welfare-state-analytics/riksdagen-corpus/blob/dev/input/most_common_unknowns.csv
Then we can all help out fixing sources of the errors?
I think there may be some bug with the ministers, will look into it. I recall correctly the list of talmän is incomplete on wikidata (definitely the third vice speakers, will look into the others). Will also start doing detective work on why some persons are not matched but need to develop some kind of tool for filtering our databases as it is very difficult now.
Great! Yes. This also indicate that we need a test suite for the mapping. We should be warned if we write over known mappings with unknown.
This has now been summarized in issue #163 instead.
There are currently MPs missing in wikidata that we need to reduce the number of unknown speakers in the parliament. As a first step we will add all the persons we need for our corpus right now. For more info, see: https://github.com/welfare-state-analytics/riksdagen-corpus/issues/149
Does this sound correct @rbbby and @ninpnin ?