welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Missing dates from party affiliation in wikidata leads to speeches with unknown party affiliation #149

Closed TomasSkotare closed 2 years ago

TomasSkotare commented 2 years ago

Essentially, even if we know who was the speaker, the party affiliation is often unknown for (generally) earlier speakers, often 1920s-1930s.

This is largely due to them not having dates associated with the their party affiliation for a given time, and there are multiple options.

One example can be: https://www.wikidata.org/wiki/Q4820820 with approx. 500 recorded speeches, all with party affiliation set to 'unknown'.

For reference, this plot shows protocols with the most unknown party affiliations from known speakers: image

salgo60 commented 2 years ago

@TomasSkotare FYI in Wikidata I am updating the party affiliation see https://github.com/salgo60/Wikidata_riksdagen-corpus/issues/38 using the book "Tvåkammar-riksdagen 1867-1970" as a source --> Q4820820 = the book vol 2 p 440

image image

The challenge is

e.g. my "guess" below 1) k 1922-1933 I am not sure I use Kommunistiska partiets riksdagsgrupp / Q111285801 based on source Sveriges riksdag 1924 : porträttalbum - August Konrad Ferdinand Spångberg (K) 2) sp 1934-1937 Socialistiska partiet 3) s 1938-1964 I assume Socialdemokraterna / Q105112

If we could get help with parties and maybe some good translation to Wikidata Q numbers and dates see WD <-> party status right now

Dont hesistate to call me if this will help you +46-735152802

MansMeg commented 2 years ago

I think this is the way to go here. Ie update the from and to dates for these parties. I think we should also setup these political parties and their abbreviations. @salgo60 is this something you have in wikidata right now? I think we have som similar mapping now in the corpus, but @rbbby know more about the details.

salgo60 commented 2 years ago

No good quality in Wikidata right now on dates etc.. you have some data in the book "Tvåkammar-riksdagen 1867-1970" see page 12 book 1 and also some Wiki articles feels have good quality

image

I feel the model should be

image image image image image

Wikidata example Frihandelsvänliga centern Q10499121

MansMeg commented 2 years ago

Hmm. Yes, we would need to formalize this somehow and also, ideally, connect their abbreviations to the parties as well to simplify your work.

What of this do we have in wiki data as of now? Could we add these party-paths in a structured way in wikidata? We would need to store it in a normalized form in our meta-data. But I guess we all want to get this in structured form?

salgo60 commented 2 years ago

What of this do we have in wiki data as of now? I guess we can only trust what is sourced from the book "Tvåkammar-riksdagen 1867-1970"

I am right now doing Wikidata_riksdagen-corpus/issues/38

The goal is to have one WD object and one svWikipedia article for every person. Status today is

as said a small mess with parties (riksdagsgrupper I guess) etc... needs some QA

MansMeg commented 2 years ago

Ok. But if I understand you correctly, the parties in the picture needs to be added as well as you suggest, right?

To us I think the date from and date to from individual mps would be important to add as well to solve this issue. Maybe just go through the mps where we have oroblems and fix them in wd?

MansMeg commented 2 years ago

I and @rbbby discussed this briefly. We think this needs to be done to solve your issue @TomasSkotare :

Then we should branch out the other issues that Magnus mentions as separate issues here, namely:

salgo60 commented 2 years ago

To us I think the date from and date to from individual mps would be important to add as well to solve this issue. Maybe just go through the mps where we have oroblems and fix them in wd?

Sounds great the best if you have it also at your place.... as Wikidata is open we get changes that can be vandalization and/or changes done with good intentions but not the same intention other people has ;-)

video

OT if you walkthrough the book "Tvåkammar-riksdagen 1867-1970"... you also has mentioned the number of "motioner" they have written themself... that could be a good quality check that you have annotated those documents... I havnt added that information to Wikidata just updated the wiki articles with that text and some links to Riksdagens documents example

image
MansMeg commented 2 years ago

This is a list of the wikidata entries we need to fix regarding start and end date of their party affiliations. I add them here, so also @salgo60 can help (if he wants to):

MansMeg commented 2 years ago

@salgo60 : I have now got a green light from the Riksdagen Library to release the book Tvåkammarriksdagen 1867-1970 as CC0. Do you need anything more from me?

salgo60 commented 2 years ago

@MansMeg excellent then we will have pictures of all Swedish PMs in that book in Wikidata

we need a statement like a webpage that its CC-0 that we can reference.....

best would be if Riksdagen Library could just have a page telling the book is CC-0 something like page Riksdagens protokoll.Andra kammaren 1962, B01

image
salgo60 commented 2 years ago

@MansMeg One vague thought also translate the sources in the book "Tvåkammar-riksdagen 1867-1970" to linked data and connect the source to the description in the book :

1) today in Wikidata we try to have one object for every Swedish PM 2) Now with the book "Tvåkammar-riksdagen 1867-1970" getting more and more important maybe we should have an entry in Wikidata for the book and every article about the person in the book ( as SPA has sj9PGLAlnmUAAAAAABfRZQ for the article "Andersson i Bringåsen" that describes WD person Q5554719 ) 3) having a wikidata item for a part of the book like Andersson i Bringåsen --> we can also add sources in a more structured way e.g Andersson i Bringåsen has at the bottom mentioned:

image

3-1) Skrifter "Tankar om de föreslagna jernbanorna i Norrland" same as Wikidata Q111516483 someone has scanned it link 3-2) Litt: SP = Svenskt Porträtt galleri --> that is scanned by Project Runeberg --> "Hans Andersson. F. i Kyrkas" page 336

image image

'''Nice to have'''

MansMeg commented 2 years ago

I now start to find incorrect parties in wikidata that would need to be fixed long term by adding the correct parties in the image above. Q5885103 was first a member of lmb that is translated as the moderate party, but this is not correct since the lmb split and some went to the "Högerpartiet" and some to what would be "Centern".

I guess that a first step would be to add all the parties in the list above in wikidata. Could you do that @salgo60 ?

MansMeg commented 2 years ago

I actually think that soon we will be able to use the corpus to deduce this automatically for WD. Although we would need to find a way to do a lot of changes as a bulk then?

salgo60 commented 2 years ago

@MansMeg bulk changes is no major problem

We use Quickstaments or Open Refine or API calls

Yes there are a lot of parties that are not updated regarding to "Tvåkammar-riksdagen 1867-1970"

If you filter the timeline you select parties like "Moderaterna" you see a lot of errors

image

Example of error is Q5983870 Olof Melin that I am fixing now see history

image

Dont hesitate to call me 0735152802 we have one issue how to modell name changes e.g. Högerpartiet --> Moderaterna is that the same object or just a name change? I have started to add them as 2 objects as we get people obset in svWikipedia if they see the wrong name for a person ;-)

For Swedish Church parishes we have implemented a maybe better solution see object Gävle parish Q10512441

image
MansMeg commented 2 years ago

I think we need to add those parties that exist here: https://user-images.githubusercontent.com/14206509/161539798-fd8fe7af-d6b8-49ad-b7fa-b650e806f6ce.png

I think Högerpartiet and Moderaterna is simply the same party but with different names. Although "Lantmanna och borgarepartiet" split into Högerpartiet, Jordbrukarnas fria grupp and Bondeförbundet. So we need to keep that as a separate party since it split? Im not sure how to add this in the best way. Well. Now time to sleep.

MansMeg commented 2 years ago

Ah! yes. As you have done with olof Melin seems very reasonable! There he has been added to the many different parties.

salgo60 commented 2 years ago

@MansMeg The challenge with Wikidata is that we are 12000 people who should agree and then some 1000 people on sv:WIkipedia and maybe some 50 000 on WIkipedias on > 200 different languages :smiley:

My feeling is that one person has added Moderaterna for every rightwing party people....

My feeling is that Wikidata

MansMeg commented 2 years ago

No, long-term it is not good enough for us. Although, right now it is not super-important.

It is hard for me to judge if everyone would agree, but formally it doesn't even there exist a "Högerpartiet" in the 1920ies. So the party back then was the "lantmanna and borgarepartiet". So I guess it is incorrect to label them as "Moderaterna" during the term in office, although some persons were probably then later members of Högerpartiet/moderaterna.

salgo60 commented 2 years ago

Yes that is the logic but maybe the person doing this just positioned them to the right ;-)

We have the same problem with "Liberalerna"

image

Karl Magnus Lindh WD Q5956462 is lib s in our bible

image
MansMeg commented 2 years ago

Yes indeed. As I said, I think at least our corpus will need to distinguish between these parties and we will base this on the Tvåkammarriksdagen book. But maybe wait with updating wikidata with this info en masse then? Or what do you think?

salgo60 commented 2 years ago

My intention is to slowly updating the WD objects and use the Tvåkammarriksdagen book as the bible.... we have > 3000 Swedish PM people but miss pictures, source Tvåkammarriksdagen book....

My todo list

salgo60 commented 2 years ago

@MansMeg "Missing dates" how will you translate when we lack dates for one party?

image

WD <-> Tvåkammarriksdagen OT I have slowly started match WD and the book see file --> salgo60/Wikidata_riksdagen-corpus/issues/38

image
MansMeg commented 2 years ago

I find this to be difficult. I would just add the dates we know. Later on I think we will be avle to deduce these dates from the protocols. We are also obly interested of the party affiliation while in the parliament.

Is there a rutine in wd how to handle say a person that first is mp for one party, then out of parliament and then enter for a new party? We dont know the exakt switch date there?

MansMeg commented 2 years ago

Also, for us it would be super helpful the more old mps that would be added since this will help us identify speakers in the protocols.

salgo60 commented 2 years ago

Also, for us it would be super helpful the more old mps that would be added since this will help us identify speakers in the protocols.

image

Lesson learned working with the book "Tvåkammar-riksdagen 1867-1970" is that there are different types of "vilde"

image
fredrik1984 commented 2 years ago

Thanks for info @salgo60! I am currently writing the revised research infrastructure application to Riksbankens Jubileumsfond. Is it correct that there were 3815 ordinary MPs in the Bichamber Rikdag 1867–1970? And do you have the a similar figure for all MPs in the Unicameral Riksdag 1970– ? Would be good to add that information in the application!

salgo60 commented 2 years ago

@fredrik1984

for all MPs in the Unicameral Riksdag 1970–

I guess you can ask Riksdagens API but its a moving target :rocket:

image

Also asks Jan

image
salgo60 commented 2 years ago

@fredrik1984 tips är att fundera över Wikibase som kommer i en cloud version Wikibase.cloud

Dennis Priskorn @dpriskorn en kille som bor i Sundsvall jobbar med Internetarchive och Wikibase om jag fattar rätt så läser dom Wikidata change strömmen (APIRecent_changes_stream) för att hitta saker.... och sedan har dom satt upp en egen wikibase ex. repository internetarchive/wcdimportbot

The prototype wikibase installation https://wikipediacitations.wiki.opencura.com

image

Feels like an approach also for research data

MansMeg commented 2 years ago

My understanding is that the book "Tvåkammar-riksdagen 1867-1970" when dates are used it is when the person was in the parlament

Thats sounds like good principle for us to use as well, in the long run since that is what we actually know from the protocols.

one WD issue is if a person is not connected to a part like "Amineh Kakakabevh" wd Q3675519 the old way in WD is set "no value" this pattern is used in Sweden and in other countries

I think WD should at least add one "party" that is "independent" connected to the wiki page: https://sv.wikipedia.org/wiki/Partil%C3%B6s

Yes. That is something we need to handle. This also connects to the previous discussion in that historically we have had different types of independents ("vildar") in the parliament. I'll connect @Stubbendorff and @fredrik1984 to this discussion. @Stubbendorff and @fredrik1984 what do you think would be the best solution to this problem? Add different types of independents historically? Or just call them all independents? I think the most correct would be to call them all independent.

Although, members of the government (ministers) without political party should have another party since they are consider "opolitiska". What do you all think?

fredrik1984 commented 2 years ago

Regarding "vildar", maybe a start would be to annotate them all as independent (or something adequate to "vilde"). Later on, it would make sense to also tag what kind of vilde they were, that is what historical name was used. But I guess there would still be a connection to which different party identities an MP had, right?

dpriskorn commented 2 years ago

@fredrik1984 tips är att fundera över Wikibase som kommer i en cloud version Wikibase.cloud

Dennis Priskorn @dpriskorn en kille som bor i Sundsvall jobbar med Internetarchive och Wikibase om jag fattar rätt så läser dom Wikidata change strömmen (APIRecent_changes_stream) för att hitta saker.... och sedan har dom satt upp en egen wikibase ex. repository internetarchive/wcdimportbot

I dagsläget kör IA en Wikibase på Wbstack som sen kommer migreras till Wikibase.cloud när den tjänsten har lanserats.

https://wikicitations.wiki.opencura.com/wiki/Main_Page är den Wikibase vi jobbar mot just nu. Där syns egenskaperna modelleringen och ett par testobjekt som tex https://wikicitations.wiki.opencura.com/wiki/Item:Q93

salgo60 commented 2 years ago

How it will look in WIkipedia Wikidata and hopefully we get a WD property for your data too in the future

* svWikipedia a reference to the scanned article of the book in SPA and a volume page ref and the name used in the index - vol. 3, s. 283, omnämnd som: Trolle i Klågerup, Carl A

image image

same english

image image
fredrik1984 commented 2 years ago

Thank you for the info @salgo60!

salgo60 commented 2 years ago

OT @fredrik1984 @MansMeg @rbbby @dpriskorn

Wiki Workshop 2022 April 25, 2022. target group researchers maybe something for you looks some ML related thing...

fredrik1984 commented 2 years ago

Thank you for the info! Unfortunately, I might have difficulties to attend that day.

Fredrik Norén PhD, Senior research assistant Humlab Umeå University SE-901 87 Umeå, Sweden +46 (0)73 995 10 15

umu.se/personal/fredrik-norenhttp://umu.se/personal/fredrik-noren westac.se/enhttp://westac.se/en inidun.github.iohttp://inidun.github.io modernatider1936.se/enhttp://modernatider1936.se/en

[cid:B2E5D2EF-2346-49EB-B8BE-F2B23E18D6EF]

By sending an email to Umeå University, the University will need to process your personal data. For more information, please read: https://www.umu.se/en/about-the-website/legal-information/processing-of-personal-data

11 apr. 2022 kl. 20:22 skrev Magnus Sälgö @.**@.>>:

OT @fredrik1984https://github.com/fredrik1984 @MansMeghttps://github.com/MansMeg @rbbbyhttps://github.com/rbbby @dpriskornhttps://github.com/dpriskorn

Wiki Workshop 2022 April 25, 2022. target group researchers maybe something for you looks some ML related thing...

— Reply to this email directly, view it on GitHubhttps://github.com/welfare-state-analytics/riksdagen-corpus/issues/149#issuecomment-1095397654, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADUCDPT3ZP4IZBUV6MANC5TVERUVPANCNFSM5SO5RM3A. You are receiving this because you were mentioned.Message ID: @.***>

salgo60 commented 2 years ago

@fredrik1984 tips är att fundera över Wikibase som kommer i en cloud version Wikibase.cloud Dennis Priskorn @dpriskorn en kille som bor i Sundsvall jobbar med Internetarchive och Wikibase om jag fattar rätt så läser dom Wikidata change strömmen (APIRecent_changes_stream) för att hitta saker.... och sedan har dom satt upp en egen wikibase ex. repository internetarchive/wcdimportbot

I dagsläget kör IA en Wikibase på Wbstack som sen kommer migreras till Wikibase.cloud när den tjänsten har lanserats.

https://wikicitations.wiki.opencura.com/wiki/Main_Page är den Wikibase vi jobbar mot just nu. Där syns egenskaperna modelleringen och ett par testobjekt som tex https://wikicitations.wiki.opencura.com/wiki/Item:Q93

Thanks @dpriskorn I feel this event driven approach reading the event stream of Wikipedia is what we need implemented in digital humaniora.... as a background with international money transactions I feel archives, museums, research plattforms feels more SILOS than trying to be part in a echosystem... this feels like one way forward

image

as said earlier that is what google has done for years see tweet from 2018

image
MansMeg commented 2 years ago

Ok. Lets close this issue right now. For now it is sufficient to add the party dates in the party_affiliation csv file. Then @TomasSkotare can handle multiple parties.