welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Fix missing roles in wikidata/missing in members of parliament file #252

Closed MansMeg closed 1 year ago

MansMeg commented 1 year ago

Some MPs are not included in the member_of_parliament.csv file. I guess they are missing the role in wikidata.

Hence we should probably systematically try to identify people with this lacking. An example:

Gustaf Johnsson, Q111804160, https://www.wikidata.org/wiki/Q111804160 You can also find him in the "Biografibanden", Band 2, p. 285

fredrik1984 commented 1 year ago

Maybe this is something to prioritize when we all MPs have a wiki-id? How was it, did Emil finish the work of going through the MP list of the bicameral period and check which ones that have a wiki-id?

MansMeg commented 1 year ago

Yes. Lets wait for Emils work. Then we can start this.

fredrik1984 commented 1 year ago

I use issue this thread to report missing metadata for MPs in Wikidata.

This person https://www.wikidata.org/wiki/Q5969615 has no data for chamber (FK), party (bf = bondeförbundet), i-ort (Löfvander i Kvarnby). Source: bio book 3, p. 239.

salgo60 commented 1 year ago

@fredrik1984

as you understand Wikidata is not rocket science and people like me are more bold --> we get some edit wars....

fredrik1984 commented 1 year ago

Thanks @salgo60 ! I appreciate your commitment!

salgo60 commented 1 year ago

Its @Ainali @miroli @tmtmtmtm @belteshassar @SchermanJ who started it and as said we dont speak much with each other - interesting queries @miroll WD user Popperipop wrote down see #112 can give an indication what he plans.....

FYI @fredrik1984

image

fredrik1984 commented 1 year ago

This person https://www.wikidata.org/wiki/Q5804642 miss i-ort (Heüman i Jönköping). Bio book 2, p. 159.

fredrik1984 commented 1 year ago

This person https://www.wikidata.org/wiki/Q6190872 is missing some party metadata, see https://portrattarkiv.se/details/sj9PGLAlnmUAAAAAABfQAQ

salgo60 commented 1 year ago

@fredrik1984 thanks

FYI:

fredrik1984 commented 1 year ago

@salgo60 yes indeed, the hierarchical tree of "party" formations and splits in the bio book is fascinating. What constitutes a party before 1900 is not always clear. But in the Swerik project, we have also decided to use the bio books as our bible. However, in the end, we might merge some of the party names that represent the same party but with different names.

We are currently working on a gold standard for name introduction for the whole period 1867–today, and that is why post some of the issues we come across in doing that work. Appreciate that you help out correcting stuff on Wikidata!

salgo60 commented 1 year ago

@fredrik1984 you also have "vilde"

image

image

image

image

image

Ainali commented 1 year ago
  • n wikidata they suggests that "vilde" is an empty value for party but I have started to create WD objects for different types of "vilde"

The items may be fine, but they should not be used to populate member of political party (P102) or parliamentary group (P4100).

salgo60 commented 1 year ago

The items may be fine, but they should not be used to populate member of political party (P102) or parliamentary group (P4100).

@Ainali They are right now in exception to constraint P102#P2302 which is a "good solution" until someone decide as @fredrik1984 states it "What constitutes a party before 1900 is not always clear" and we get a "golden standard" maybe we need new properties....

image

Olof Karswall wrote a paper "Historical Settlement Units as Linked Open Data" I guess we need something like that for "political parties/parliamentgroupes/ active politician who has left the party they were elected for/ independent politicians.... from a project as ParlaMint that includes more countries ?!?!?!

image

fredrik1984 commented 1 year ago

We will discuss the party issue before 1900 in the Swerik project. Back in that time "parties" were more lists or groups that voted for the same thing in the parliament. Although they were not parties in our modern definition I think it is reasonable to still tag them as parties.

"Vilde" sounds like a good category to be included, especially since our "bible" (the bio books) uses that term.

salgo60 commented 1 year ago

My understanding is that @Ainali and @miroli are doing it right now as the rest of the Wikidata community e.g.

image

_Kakabaveh hävdade att hon inte meddelades i förväg om uteslutningen utan fick reda på det via media.[29] Hon lämnade partiet samma dag på egen begäran_

salgo60 commented 1 year ago

@fredrik1984 another research "wet dream" I guess is the connection between Valmyndigheten and the Swedish PM data see my try 2019 #85 Valmyndigheten koppling Riksdagens öppna data maybe you can get that data as researcher I hope they have the Swedish "personnummer" in both systems but I didnt get a good license

Looks like UK has coordinates for election districts in Wikidata map below see example district Q3238840

image

OT in Wikidata we supports areas - shapes - which is even better than a singel coordinate --> would be nice to get electoral Swedish districts as shapes

image

salgo60 commented 1 year ago

@fredrik1984 Another related discussion on sv:Wikipedia regarding the names of parties and they the need to have the correct name when something happened "Wikipedia:Bybrunnen#Inkonsekventa_partinamn"

  1. Wikipedia lacks an authority
  2. Easy get this data as CC-0 and machine readable
fredrik1984 commented 1 year ago

https://www.wikidata.org/wiki/Q6178299 is missing i-ort (Sjödahl i Göteborg) and specific start dates in FK (20 March 1931). Source: bio book 4, p. 150.

He has not got a link to the Swedish portrait archive either.

fredrik1984 commented 1 year ago

https://www.wikidata.org/wiki/Q6012010 is missing i-ort (Nilsson i Mölndal). Source: bio book 4, p. 128.

He has not got a link to the Swedish portrait archive.

BobBorges commented 1 year ago

Re the original issue, is this (image) what's missing that causes the people/wiki_ids to not get scraped into our metadata?

image

It's related to issue #265 and I would / could just run through our known_mps/catalog and add this attribute to the ones who don't have it.

MansMeg commented 1 year ago

@BobBorges Im not sure, but I dont think so. @ninpnin knows the script to update the mp database based on wikidata. I would check how they are selected in the API query to wikidata. I think they are selected based on role.

BobBorges commented 1 year ago

So, it's these:

image

wd:Q10655178 wd:Q33071890 wd:Q81531912

MansMeg commented 1 year ago

I guess so, but @ninpnin should confirm. Are the MPs missing in our database missing these attributes?

BobBorges commented 1 year ago

We talked about it last week -- if I understand what's going on correctly -- this causes missing_member_ofparliament, and also some of the other `missing`s are because without this attr the wiki_id doesn't make it into the query results in the first place.

MansMeg commented 1 year ago

Probably. Another argument why to try to fix the tests sequentially. ie. start with fixing so the person.csv test pass. Then we merge that test suite and the updated mp database that passes the first test. Then take the next test etc.

BobBorges commented 1 year ago

missing member and missing person are both 103 IDs and the rows in the summary (print(df)) are the same – if we fix member attr on wikidata, they might both resolve at once

salgo60 commented 1 year ago

@BobBorges et al do you have a good english translation of the book title "Enkammarriksdagen 1971-1993/94" same as Q111443541 right now we miss an english label. If you have just update WD

image

BobBorges commented 1 year ago

I guess it would be Unicameral Parliament.

fredrik1984 commented 1 year ago

I think a more correct translation would be Unicameral Riksdag. flersprakig-ordlista-nov2020.pdf

A good Swedish-English dictionary with Swedish parliamentary terms is the one attached here.

BobBorges commented 1 year ago

Ok Here's a list of Wiki IDs causing the member_of_parliament unit test to fail -- it means they're missing the role wd:Q10655178, wd:Q33071890, or wd:Q81531912.

MansMeg commented 1 year ago

Great! @salgo60 Do you have time to look at these MPs?

salgo60 commented 1 year ago

@BobBorges I cant check your list (guess I miss permissions) so I do it on #121

fredrik1984 commented 1 year ago

https://www.wikidata.org/wiki/Q55955 is missing AK start date (16 March 1897). Source: https://portrattarkiv.se/details/sj9PGLAlnmUAAAAAABfN4g

salgo60 commented 1 year ago

@fredrik1984 FYI a discussion is now on Wikipedia Bybrunnen that we are not consequent in the usage of the names of Swedish parties ... I suggest that we instead of having just tables in articles make this data as data in Wikidata or any other database...

I våra artiklar om äldre svenska städer, köpingar och kommuner (till exempel artikeln Strömstads stad) är vi inte konsekventa vad gäller partinamnen 1) best would be if you had this data... and we could quote you 🚀 I guess

image

fredrik1984 commented 1 year ago

https://www.wikidata.org/wiki/Q6209073 as a slightly incorrect i-ort (correct should be Thorell i Stolp-Ekeby). Also, he is missing party for https://www.wikidata.org/wiki/Q10554125. Source: bio book 1, p. 271. https://portrattarkiv.se/details/sj9PGLAlnmUAAAAAABHgKQ

fredrik1984 commented 1 year ago

https://www.wikidata.org/wiki/Q6078273 is missing i-ort (Roos i Malmö). Source: bio book 3, p. 266.

fredrik1984 commented 1 year ago

@salgo60 from Swerik's perspective, if a party just change name (e.g. from bondeförbundet to centerpartiet) then it should be regarded as the same party. But bonderförbundet is not the same as jordbrukarnas riksförbund (the latter merged into the former in 1921).

MansMeg commented 1 year ago

Are you sure about this Fredrik? I think we might want an instance per name so folkpartiet becomes liberalerna. Even though the party as such has not changed, the data has. So we need to keep track of all the names (and their abbreviations) to map to the protocols, is my guess.

fredrik1984 commented 1 year ago

I guess my comment was more from a historical perspective. From a technical/curational point of view I would say you are right @MansMeg

MansMeg commented 1 year ago

Yes. When I think about it this is not very clear how to define. We could also say that we have multiple instances of a name for the same party.

BobBorges commented 1 year ago

@salgo60 how confident are you in the iorts from the list you and @Emil produced? If you're relatively confident they're mostly OK, I will add them all to wikidata programatically which would sort out @fredrik1984 's comment about Roos i Malmö and the other 500ish MPs missing an iort.

MansMeg commented 1 year ago

Should we wait to update all these iorts? Im a little hesitant to bulk upload before we have done any quality control of the files.

BobBorges commented 1 year ago

This is why I was asking about it. I was under the impression that the list I got from Emil (why can't I tag him btw?) was hand curated from the bio books -- if that's the case it should be pretty good. If it's not the case then ¯\_⦅ツ⦆_/¯ we should quality control it.

salgo60 commented 1 year ago

https://www.wikidata.org/wiki/Q6012010

@fredrik1984 done

salgo60 commented 1 year ago

@salgo60 how confident are you in the iorts from the list you and @emil produced? If you're relatively confident they're mostly OK, I will add them all to wikidata programatically which would sort out @fredrik1984 's comment about Roos i Malmö and the other 500ish MPs missing an iort.

@BobBorges The only error so far is

If you do the update as an transaction then we could rollback if it looks too crazy.....

Question: What does iort stands for?

fredrik1984 commented 1 year ago

@salgo60 iort/i-ort stands for "i riksdagen kallad", used by the speaker of the house to address MPs, often the place where an MP lived. I just found this wiki page: https://sv.wikipedia.org/wiki/I_riksdagen_kallad

Good thing to know that after 1976/77 they stopped address MPs with herr/fru/fröken. "Sedan 1977/1978 betecknas ledamöterna istället med både för- och efternamn, men vid behov av särskiljning tillfogas ett orts- eller gårdsnamn"

fredrik1984 commented 1 year ago

https://www.wikidata.org/wiki/Q6012769 is missing i-ort (Nisser i Grycksbo) and specific start date in FK (11 January 1938). He is also missing a reference to the portrait archive. Source: bio book 5, p. 80. ping @salgo60 (do you want me to ping you for each post of a missing MP metadata in Wikidata?)

A question to @MansMeg @BobBorges @ninpnin – regarding Bob's comment above, should I not continue reporting these missing MP metadata on Wikidata? I thought that Emil had already added all missing MPs on Wikidata, but that some metadata is still missing, hence why I add these comments.

MansMeg commented 1 year ago

Please add this Fredrik. Emil and @salgo60 added all people on wikidata that was missing. Although not all metadata has been added. We do this iteratively. So now we have found people that is missing information on role that needs to be fixed so we get all mps in. Thats the first step. Then we gonna add more and more metadata on a need to have basis.

fredrik1984 commented 1 year ago

Ok, good, I thought so! Going through Väinö's CSV file with MP introduction is very good to see what the quality of the MP metadata in Wikidata looks like, and it is often very good.

@Lottabrorsson – would be great if you and Mattias also report missing MP metadata in Wikidata here when you go through your share of MPs in the CSV file!

fredrik1984 commented 1 year ago

@ninpnin – this FK-MP (1942–1945) https://www.wikidata.org/wiki/Q6175985 is missing from the input/mp/fk files: https://github.com/welfare-state-analytics/riksdagen-corpus/tree/main/input/mp/fk

https://portrattarkiv.se/details/sj9PGLAlnmUAAAAAABfNlg