Closed MansMeg closed 2 years ago
Currently, there are a lot of instances where multiple people are matched per intro. For instance Herr Johansson i Älvsjö might match all Johanssons and thus the speaker is undetermined.
We need to address this, among other issues in the MP metadata connection. The aim is at 90% accuracy or more, which will be validated by drawing a random sample of pages.
The annotation classifies the intros into three categories
Correctly tagged:
<note type="speaker">
Måns Magnusson:
</note>
<u who="mans_magnusson_1234">
[...]
Incorrectly tagged
<note type="speaker">
Per Andersson, som yttrade:
</note>
<u who="sven_andersson_1234">
[...]
Unknown
<note type="speaker">
Finlands president Niinistö, som sade:
</note>
<u who="unknown">
[...]
The latter ones are relatively easy to find computationally, the first two need to be annotated by hand.
After discussion: @ninpnin will try to fix the obvious error sources found in the subsample by @rbbby then when done, a new subsample will be drawn
I wrote down some observations on the sample https://github.com/welfare-state-analytics/riksdagen-corpus/blob/mp/input/curation/mapping_sample_0.md
Errors detected in the first sample seem to fall into the following categories, in the order of least work per improvement
We need to at least get 1. and 2. to reach 90% accuracy. The work on ministers is alrady ongoing #71, but we need to work on expanding the metadata as well.
Subset of #80
There is currently a lot of unknown maps between names in parliament and actual MPs. We should list those missing names in parliament and go through them in increasing order.
@ninpnin any thoughts?