welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Improve mapping between intros and MP metadata #41

Closed MansMeg closed 2 years ago

MansMeg commented 3 years ago

There is currently a lot of unknown maps between names in parliament and actual MPs. We should list those missing names in parliament and go through them in increasing order.

@ninpnin any thoughts?

ninpnin commented 3 years ago

Currently, there are a lot of instances where multiple people are matched per intro. For instance Herr Johansson i Älvsjö might match all Johanssons and thus the speaker is undetermined.

We need to address this, among other issues in the MP metadata connection. The aim is at 90% accuracy or more, which will be validated by drawing a random sample of pages.

MansMeg commented 3 years ago
ninpnin commented 3 years ago

The annotation classifies the intros into three categories

Correctly tagged:

<note type="speaker">
Måns Magnusson:
</note>
<u who="mans_magnusson_1234">
[...]

Incorrectly tagged

<note type="speaker">
Per Andersson, som yttrade:
</note>
<u who="sven_andersson_1234">
[...]

Unknown

<note type="speaker">
Finlands president Niinistö, som sade:
</note>
<u who="unknown">
[...]

The latter ones are relatively easy to find computationally, the first two need to be annotated by hand.

MansMeg commented 3 years ago

After discussion: @ninpnin will try to fix the obvious error sources found in the subsample by @rbbby then when done, a new subsample will be drawn

ninpnin commented 3 years ago

I wrote down some observations on the sample https://github.com/welfare-state-analytics/riksdagen-corpus/blob/mp/input/curation/mapping_sample_0.md

ninpnin commented 3 years ago

Errors detected in the first sample seem to fall into the following categories, in the order of least work per improvement

  1. 30% Problems detecting ministers
  2. 30% Errors with "X i Y" type intros, mostly stemming from incomplete metadata
  3. 40% Miscallenous problems in the mapping process

We need to at least get 1. and 2. to reach 90% accuracy. The work on ministers is alrady ongoing #71, but we need to work on expanding the metadata as well.

ninpnin commented 2 years ago

Subset of #80