Open fredrik1984 opened 1 year ago
From what I can see (@ninpnin correct me if I'm wrong here), intro detection relies primarily on herr/fru/fröken/talman + name + :. Spelling of these is consistent in the 1874 SAOL, except talman, which is not listed - we may look out for double \<l>, by analogy with similar words (There are 61 instances of tallman up to 1891, but none of them appear to be intros on first glance).
The intro mapping prioritizes finding Herr/fru/fröken + Name. It also tries to find names by capitalization, which should not break even if the spelling is weird:
(the list is in priority order)
How is the introductions predicted by Jespers thesis included. Is that used 1920- and regexp before?
Hm, although "talman" is not mentioned in the dictionary from 1874, it was still a used term in the parliament. But not common in intros from what I can see. Rather, it often refers to "talmannen" in descriptions of what is going on in the chamber.
The older spelling will impact other intros like:
"Grefve Hamilton:" (count Hamilton, later spelled Greve) "Chefen för Kongl. Ecklesiastik-departementet, Herr Statsrådet Wennerberg:" (here, Kongl later became Kungl as an ambrivation of Kungliga/Royal)
There also seems to be manny "Friherrar" in the parliament from the 19th century... I guess we have to adapt the algorithm for that!
However, both the intro detection and segment classification rely on neural networks, and they have only been trained on data from 1920-1989. I.e. the training data does not match the data we use it on.
@fredrik1984 can we safely assume all friherrar are MPs, or could a minister be called a friherr too?
Yes, and there are different ways to present MPs before 1920, especially when it comes to titles: Greve, Friherre etc
@ninpnin I guess a friherre could be a minister as well. In the 19th century, if the speaker of the house was a count he was introduced as "Herr greve and talman".
I guess that a person who is "friherre" could also be a minister who is not a MP. But most are of course MPs. But then I suppose they are also introduced as a minister. Like this: Chefen för Kongl. Ecklesiastik-departementet, Friherre Statsrådet Wennerberg
"Friherre" is some kind of lord: https://en.wikipedia.org/wiki/Freiherr
It looks like they have both titles then. Eg. Herr Statsrådet Friherre von Otter. In that case it shouldn't be an issue for us.
Ok, good!
How is the introductions predicted by Jespers thesis included. Is that used 1920- and regexp before?
Ping @ninpnin . Is this correct? Or is the regexp the way to identify the individual person from a intro segment?
OT SPARQL Wikidata Swedish MPs with P97 Nobel title as Friherre Q1338119 - quality unknown
List 243 records / just First and Second chamber 162 records
Timeline%0A?birth%20?death%20?partyLabel%20WHERE%20%7B%0A%0A%20%20VALUES%20?member%20%7B%0A%20%20%20%20wd:Q33071890%20%0A%20%20%20%20wd:Q81531912%20%0A%20%20%20%20wd:Q82697153%20%0A%20%20%20%20wd:Q10655178%20%0A%20%20%7D%0A%20%20VALUES%20?friherrer%20%7Bwd:Q1338119%20%7D%0A%20%20?person%20wdt:P39%20?member;%0A%20%20%20%20wdt:P1343%20?source.%0A%20%20?person%20wdt:P97%20?friherre.%0A%20%20OPTIONAL%7B?person%20p:P1343%20?pTva.%7D%0A%20%20OPTIONAL%7B?person%20wdt:P102%20?party%7D%0A%20%20OPTIONAL%7B?person%20wdt:P569%20?birth%7D%0A%20%20OPTIONAL%7B?person%20wdt:P570%20?death%7D%0A%20%20OPTIONAL%7B?pTva%20ps:P1343%20wd:Q110346241.%0A%20%20?pTva%20prov:wasDerivedFrom%20%5B%20pr:P4819%20?SPAid%20%5D.%7D%0A%0A%20%20OPTIONAL%20%7B?person%20wdt:P18%20?bild%7D.%0A%20%0A%20%20BIND(URI(CONCAT(%22https://portrattarkiv.se/details/%22,?SPAid))%20AS%20?SPA)%0A%0A%20%20SERVICE%20wikibase:label%20%7B%20bd:serviceParam%20wikibase:language%20%22sv,en%22.%20%7D%0A%7D%20GROUP%20BY%20%20?person%20?personLabel%20%20?death%20?birth%20?partyLabel%0Aorder%20by%20desc(?death)&md=true&g=article&l=SPA&t=personLabel&s=birth&e=death&i=bild&d=0&c=partyLabel&f=partyLabel&v=t) with link Tvåkammar Riksdagen if its added to WD
@MansMeg NN for intro detection, regexp for intro mapping
Friherrar pull request https://github.com/welfare-state-analytics/riksdagen-corpus/pull/274
There might be a need to take into account different spelling reforms that could have an impact on our algorithms to, for instance, identify speaker introductions et cetera.
There seem to have been two bigger reforms during the Swerik period: around 1869 and 1906 (when we got the “modern” Swedish). After 1906, the “hv” was replaced by “v” (e.g. “hvad” became “vad”). Already after 1890, “qv” started to be replaced by “kv” (e.g. “qvinna” became “kvinna”).
I did a lite bit of search and it seems that Svenska Akademien’s dictionary (SAOL) is a good source to trace these changes. For example, comparing the different editions. For example, 1874, 1889, and 1923 editions.
https://spraakbanken.gu.se/saolhist/
http://runeberg.org/saol/