welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Decision on how to store political party #358

Closed MansMeg closed 11 months ago

MansMeg commented 11 months ago

In the discussion with the voting protocols project there has been a problem with storing political parties for individual MPs. We need to discuss how to store political parties in the best way.

Roughly there are two ideas forward:

  1. Store parties as the latest party name. I.e. Torbjörn Fälldin was only member of one party, Centerpartiet, even though Centerpartiet changed name when Fälldin was a member of the parliament.
  2. Store the party's name with the actual name at the time. That would mean that Fälldin was members of two parties, both Centerpartiet and Bondeförbundet, during different periods. I.e. the party affiliation CSV would have two affiliations for Fälldin.

The argument for 2. is that this is a much more data-neutral way to store the party affiliation. This will also map perfectly with the documents in the parliament.

Many researchers might want to have 1. as well due to the simplicity of analysis. So we should setup either a function to compute that based on a mapping table of new and old parties or maybe even create a file with this mapping being done.

We could also store the relationships between the different parties, such as Centerpartiet and Bondeförbundet.

I think me, @fredrik1984 and @Lottabrorsson are leaning toward 2. But what do you others think?

BobBorges commented 11 months ago

+1 for option 2. Option 1 is an analytical decision.

fredrik1984 commented 11 months ago

Many researchers might want to have 1. as well due to the simplicity of analysis. So we should setup either a function to compute that based on a mapping table of new and old parties or maybe even create a file with this mapping being done.

I think we should try to do this as well, it will most likely be very appriciated.

BobBorges commented 11 months ago

Has the decision now been taken? Can we close the issue?

MansMeg commented 11 months ago

I suggest we either wait until the next leadership meeting or project meeting to decide formally. I think it is up to @fredrik1984 to decide where he wants decisions like this to be made.

fredrik1984 commented 11 months ago

Let's bring it up at the leadership meeting next week.

BobBorges commented 11 months ago

related to #201

salgo60 commented 11 months ago

We could also store the relationships between the different parties, such as Centerpartiet and Bondeförbundet.

I would say dont use text strings use persistent identifiers and semantics... follow FAIRDATA F1

Here is a SPARQL search for political party / Q7278 in Wikidata what properties are used number of Times

image image image image image image

Modelling parties in Wikidata - defines which parties are a best practice example of modells in Wikidata

image image image
MansMeg commented 11 months ago

I agree. We should have our own persistant identifiers for all parties.

salgo60 commented 11 months ago

I agree. We should have our own persistant identifiers for all parties.

Yes and linked data what Linked Open Data is and why it's a good thing, both for users and for data providers.

MansMeg commented 11 months ago

We have decided that we will always store party as the name at the historical time-point.

salgo60 commented 11 months ago

We have decided that we will always store party as the name at the historical time-point.

Sad to hear that you dont move direction linked data and 5-star data and focus on sourced data....

store party as the name

never store the name as a string use persistent identifiers

MansMeg commented 11 months ago

Oh. We will use persistent identifiers. this is just if we should store (as is done in wikidata) eg Folkpartiet as Liberalerna even before they changed the name. We are going to say that people were part of first FP then L (with parties having persistent identifiers).

salgo60 commented 11 months ago

Things not Strings

Oh. We will use persistent identifiers. this is just if we should store (as is done in wikidata) eg Folkpartiet as Liberalerna even before they changed the name. We are going to say that people were part of first FP then L (with parties having persistent identifiers).

What will you do with centern ? "Wikidata never use text strings" in the same way that Wikipedia has ambiguous pages see article Wikipedia:Disambiguation that are telling you this text string "Nya centern" can mean x or y.... --> you as an wiki article writer should not link an wikipedia Disambiguation page as that is a naming conflict that needs to be resolved... Wikidata solution is persistent unique identifiers....

image

Persistent identifiers and sources stating what you add is the only way forward and is what Wikidata tries to explain... Spread sheets with text strings and no sources is not 2023... sorry..,,.

image

image

Support for multiple languages

image image image

Semantics and a knowledge graph

image image
dpriskorn commented 11 months ago

I agree with Magnus, you should disambiguate all parties and store a persistent identifier only. People consuming your dataset can then look up what the thing is called, when it existed, what other party it differs from, etc. If you for some person don't know the party you should store that too in some consistent machine readable way.

MansMeg commented 11 months ago

Yes. That is what we are going to do. Sorry for not being clear enough. The decision is just that we will store the persostant id for the party name at the given time, not the latest party name id.

salgo60 commented 11 months ago

Yes. That is what we are going to do. Sorry for not being clear enough. The decision is just that we will store the persostant id for the party name at the given time, not the latest party name id.

Thanks

And as I stated 6 months ago its important that you get this infrastructure with persistet identifier live so we can start link you #269

image
MansMeg commented 11 months ago

Yes, we know. We moved this up to start to work with this in the coming month.

monirbounadi commented 11 months ago

The way you do it makes sense to me. One thing I noticed though was that an MP could have "kommunistiskt parti" as party affiliation. Why is that? "kommunistiskt parti" is not a party. Is it because you could not find which communist party the MP belonged to?

MansMeg commented 11 months ago

This is most likely an error in the wikidata we need to fix. Could you file this as a separate issue and we will follow up on this?

fredrik1984 commented 11 months ago

Yes, check in the bio book what party is used there. My experience its that all party names mentioned in the bio books also have a wikidata-id

salgo60 commented 11 months ago

The way you do it makes sense to me. One thing I noticed though was that an MP could have "kommunistiskt parti" as party affiliation. Why is that? "kommunistiskt parti" is not a party. Is it because you could not find which communist party the MP belonged to?

@monirbounadi

all the stuff with the communist party and vilde in WD is a mess we have marked some of the problematic records with Q120143028 my hope and wish has been that we should get good Linked data from this project for parties and the relationship between parties and what parties people was connected to.... I can see odd things with e.g. Frisinnade landsföreningen see below....

image

image

salgo60 commented 11 months ago

@monirbounadi see below how Wikidata is connected to SKBL and some political parties.... sad thing is they use text strings and has no sources in the metadata feels we see the same problem all the time ;-) --> the work SKBL has done add little value is my feeling...

image

SKBL - Svenskt kvinnobiografiskt lexikon

Looks like SKBL has connected people to parties see below its a mess in WIkidata but also a mess in SKBL as they use text strings and has no Persistent Identifiers

image

SKBL Strings not things

I met the people in SKBL 2020 dec and tried to explain what a knowledge graph was, the need for persistent identifiers but I feel they didnt understand see slides

image image image

More things I did

image

SKBL Antipattern political parties as text strings and no good sources in metadata

image image

image

The book "Tvåkammar-riksdagen 1867–1970" anti-pattern more articles about the same person and different statements and no structured data or persistent identifiers

see #157#issuecomment-1714283975 - > 150 persons has more than one article see GIST

salgo60 commented 11 months ago

@monirbounadi to find a WD object from SKBL you can either use the Chrome plug.in Entity Explosion or the hub.toolforge.org

If SKBL had done the homework with Linked data and had Persistent identifiers for everything like political parties we could have used Wikidata and jumped from all the political parties in Wikidata to the related landing page in SKBL for that party see below a try from Q1594086 and we see mostly library sources like VIAF 122570825 Worldcat Den Store Danske-ID.... not SKBL SBL or other sites ...

image
salgo60 commented 11 months ago

cc: @MansMeg @BobBorges @monirbounadi @fredrik1984

Maybe something for you to take part of as wikidata is a part of your echo system

image image image