welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Talman is unknown #218

Closed MansMeg closed 1 year ago

MansMeg commented 1 year ago

It seems like if the introduction says talman the person is commonly set to unknown. This is not super-important to fix now. But should be part of the backlog on stuff to fix long-term.

fredrik1984 commented 1 year ago

On this Wikipedia page there is lists of all ordinary talmän, and all vice talmän (and vice vice talmän for the period 1971–) in Sweden since the Riksdag of Estate

https://sv.wikipedia.org/wiki/Sveriges_riksdags_talman

ninpnin commented 1 year ago

https://github.com/welfare-state-analytics/riksdagen-corpus/blob/main/corpus/metadata/speaker.csv This metadata seems comprehensive.

@MansMeg can you link to specific cases so we can investigate further?

MansMeg commented 1 year ago

I found it in @TomasSkotare s file in #202 . Roughly 4000 instances of Herr TALMANNEN is classified as unknown. So the question is how to handle this?

fredrik1984 commented 1 year ago

Until we know if it is ordinary or vice speaker maybe we should annotate those as "Speaker of the house". Then we should get a better sense of how many "real" unknowns we have. Or maybe @ninpnin has a better idea?

MansMeg commented 1 year ago

Is this now done and should be closed? Or should we add the identified issues on segmentation quality and multiple persons in introductions?

ninpnin commented 1 year ago

This one is done. Let's make sure the adjacent issues are tracked.

MansMeg commented 1 year ago

Are they tracked?