welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Missing and imprecise biographic data #360

Closed monirbounadi closed 11 months ago

monirbounadi commented 11 months ago

@salgo60 introduced me to this project. Given my work on late 1800s Swedish politics, I'm intrigued by the biographical data of MPs from Tvåkammarriksdagen 1867-1971. This data can help match MPs to historical registers. Ideally, we'd have comprehensive birth dates and places for each MP, with birthplaces detailed to the parish or congregational level per Swedish censuses.

I assessed Wikidata MPs against this criteria. I've updated several entries: previously, some only had birth years, and around 180 lacked birth locations. I've fixed the former and added locations for 80 MPs, leaving about 100 incomplete. While I often didn't cite sources, I always used reliable tertiary ones. Sometimes, I directly referenced Swedish household examination records.

However, I've noticed some MPs with vague birth or death locations, like "Lund" or "Stockholm", which aren't specific parishes or congregations.

I recommend:

After using tertiary sources, let's link Swedish-born MPs to the Swedish Death Index 8, a quality secondary source. This addresses the second task and also helps us determine to what degree we have correctly assigned birth locations to MPs.

Please take all of this as a suggestion (since I am merely an outside contributor).

salgo60 commented 11 months ago

Just out of curiosity what is your user case? Why is the birth location important?

The problem I always see when adding data from a source with text strings is what they reference a parish, a city or a combination.... is difficult to say

This issue with using parishes as death location was also not satisfying for Litteraturbanken link that use Wikidata a lot e.g. they would like to see the death location of August Strindberg as Blå Tornet...

fredrik1984 commented 11 months ago

Hi @monirbounadi! I am happy that you have found out about our Swerik project (https://swerik-project.github.io/), and that you find our work useful! Currently, adding birth/death place is not prioritized in our project, but it is great that you do this since we can include this as metadata in our MP database later on. I guess you already have access to the digitized biographical books (1–5) of the bicameral riksdag?

What is your PhD project about?

monirbounadi commented 11 months ago

@salgo60 To begin with I want a basic understanding of the selection of MPs, which includes where in Sweden they come from. To get more precise information on the MPs, for example, their family or socioeconomic background, I would need to link them to other historical registers, and for accurate linking the birth location is necessary.

@fredrik1984 Thanks for your work! Looking at this project is mostly exploratory for me at this stage. Yes, I think I saw the digitized books here at Github somewhere.

My project is about the struggle for female suffrage in Sweden. Together with another student, I look into the geographical distribution of women's engagement in local politics and popular movements around the end of the 1800s. A long time ago we tried to see whether we could systematically get a sense of various MPs' views on female suffrage. The National Association for Women's Suffrage (Sweden) made some surveys on this in the early 1900s (they explicitly asked MPs about their views; the surveys can be found at the National Archives of Sweden in Stockholm). Until 1923, if I remember correctly, voting was secret in Tvåkammarriksdagen. Thus, we can't for example see which MP voted in favor and which voted against female suffrage at the end of the 1910s. However, maybe we could proxy their views by systematically analyzing what they said about suffrage in Tvåkammarriksdagen, or whether their views were just a function of the dominant view held by the party they belonged to.

Beyond that, I am just interested in the data for other exploratory reasons.

salgo60 commented 11 months ago

My vision as said is that we should see the window a person lived in when he/she died --> If I walk down a street we should raise the phone and see articles from newspaper about the street or people living on the street or like in my video the apartment Petren lived in when she died see deathbook Scheelegatan 13

image image

I did some check in Wikidata and Swedish PM and death location see #233


Thus, we can't for example see which MP voted in favor and which voted against female suffrage at the end of the 1910s

If you could map the voting list to WIkidata Qnumber I feel we could rather easy create a map showing in different colors if the voted yes or no

monirbounadi commented 11 months ago

That's a wonderful vision!

"If you could map the vortinglist to WIkidata Qnumber I feel we could rather easy create a map showing in different colors if the voted yes or no". Hmm, how do you mean? Before 1923 (iirc), the only thing we know is the number of MPs who voted yes, no or who did not cast a vote. So we cannot assign votes to MPs before 1923.

Yes, the salary is interesting. I think one can systematically extract information about salary and education from the 1930 census. However, by connecting the MPs to the 1880-1910 censuses we get their occupations, which might give a better sense of their socioeconomic position. The sources are complementary of course.

salgo60 commented 11 months ago

I think one can systematically extract information about salary and education from the 1930 census

I have education still on my to do list I havnt found how to make it Wikidata friendly see #102 #105

fredrik1984 commented 11 months ago

My project is about the struggle for female suffrage in Sweden. Together with another student, I look into the geographical distribution of women's engagement in local politics and popular movements around the end of the 1800s.

Ok, cool! Hope we can collaborate further on this too. We in the Swerik project are always looking for collaboration!

monirbounadi commented 11 months ago

My project is about the struggle for female suffrage in Sweden. Together with another student, I look into the geographical distribution of women's engagement in local politics and popular movements around the end of the 1800s.

Ok, cool! Hope we can collaborate further on this too. We in the Swerik project are always looking for collaboration!

Thanks! I'm open to collaborations. In any case, I will follow the work you do.

salgo60 commented 11 months ago

@monirbounadi another try I did was to get occupation classified with HISCOcodes see rep /salgo60/HISCOKoder

image

MansMeg commented 11 months ago

Hi!

It is great that you reach out! As @fredrik1984 mentioned, fixing birth dates and parishes are not our main priority right now, but we are happy if you would contribute with this!

I think this might be worthwhile for you, since if you add it to wikidata you can use our corpus to map the persons to debates, motions etc. Hence, I see a little of a win-win here.

If you are interested in going through the missing dates we could probably help out by generating a file on which mps are missing dates and open a specific issue for this. Would that be of interest to you?

monirbounadi commented 11 months ago

Hey @MansMeg! Thanks! Yes, it sounds like a win-win. Yes, if you generate a file with missing birth dates and missing birthplaces, I'm happy to consider looking at that issue.

salgo60 commented 11 months ago

Yes, if you generate a file with missing birth dates and missing birthplaces, I'm happy to consider looking at that issue.

@monirbounadi

I guess what you are looking after to use is the electoral district not the birth location?

image

I guess missing birth date and birth location is the records missing the source "Two-Chamber Parliament 1867-1970" Q110346241

image image
monirbounadi commented 11 months ago

Thanks @salgo60! It is actually the birth locations I am interested in as a first step. I think that step is easier. How would you go about finding the election district for each MP? It is very tough to get to the municipal boundaries for a given year in Sweden. I have a map of municipal districts in 1919. I know of no other map of municipal districts.

It would be interesting to get a query that gives the set of MPs with a birth location that is not a parish or congregation of Sweden.

salgo60 commented 11 months ago

How would you go about finding the election district for each MP

We have it in Wikidata example SPA sj9PGLAlnmUAAAAAABfQbA same as Q5890333

image

has election district Q10710315 "Uppvidinge härads valkrets"

You have

image
salgo60 commented 11 months ago

@monirbounadi let me know if I should show you Wikidata we can share screen

image
monirbounadi commented 11 months ago

@salgo60 Sure we can talk more about this. Very tired today and probably also tomorrow so I'm taking it a bit easy this week! It's great if there's 100% coverage w.r.t. election districts!

salgo60 commented 11 months ago

Let me know when you wake up ;-) I will be away during the weekend... you have my number and we can start with a short session... Linked data is a never ending learning process I feel we also should look into Open Refine how you can reconsolidate your voting data with Wikidata

salgo60 commented 11 months ago
image
monirbounadi commented 11 months ago

Hey again! So I've done the following recently:

In any case, this helps track down the birthplaces of MPs.

fredrik1984 commented 11 months ago

Thanks a lot, @monirbounadi! Great work, which will also be of use to us. Will you continue to add MP metadata to Wikidata?

monirbounadi commented 11 months ago

@fredrik1984 My goal is to get a full picture of the birth locations and birth dates of all MPs. Wikidata definitely helps with that but now that I've found some errors I am not sure how accurate it is. For example, this guy has 3 (technically 2) birthplaces: https://www.wikidata.org/wiki/Q6043666. I think "Södra Åby" is the right one as it is the one I traced down in the parish records.

I realized that by using name, birth, and death date, and then matching it to the Swedish death index, I got a quick way to get more accurate information on the birthplaces. However, it should be noted that many times Wikidata is more precise than the death book. This is in cases where for example the death location is different from the locality in which the MP formally resided.

To answer your question, I'll add some MP metadata when I find it necessary for me to get closer to "a full picture of the birth locations and birth dates of all MPs". This means that whenever manual coding is necessary, I do the manual stuff at Wikidata.

fredrik1984 commented 11 months ago

Ok, sounds like a good plan!

monirbounadi commented 11 months ago

I have attached a link to a list of the WD records of MPs that could not be matched to the Swedish Death Index 8. The easiest to explain cases are (a) did not die in Sweden or (b) have not died.

It is a CSV file.

https://www.dropbox.com/scl/fi/hvh1othhukwif8nt0pu6t/missing_postID_cleaned.csv?rlkey=ic4tvovbawhgrllpcudc9ek1f&dl=0

MansMeg commented 11 months ago

This is great! I assume that we would just want to update our metadata now with this information? Or is it anything you would need from us @monirbounadi ?

@BobBorges , I guess this a new PR with this new data by updating from wikidata?

BobBorges commented 11 months ago

yep

monirbounadi commented 11 months ago

@MansMeg Great! No, I need nothing from you. I will let you know when I am confident we have all the birth locations and when I'm satisfied with the quality of the biographic information. Thanks!

monirbounadi commented 11 months ago

Example of a person with a completely wrong birthplace in the book Tvåkammarriksdagen:

image

image

Found some additional birth and death dates today that were incorrect too.

Will leave this for now until next week.

salgo60 commented 11 months ago

Here you have how Lundström in Filipstad is connected to SBL article Lundström 34902

image image

And the family tree of the son Herman Q1610294 - what we have in WD

image

What you should do is NER on those old SBL texts

see this Wikidata weekly newsletter 2023 09 25

image image image

imagesee

salgo60 commented 11 months ago

Example of a person with a completely wrong birthplace in the book Tvåkammarriksdagen

image

My talk och blogpost 2021 about metadataroundtripping and how we 1750 had persistent identifiers for Runestones but today we do solutions with strings

image
monirbounadi commented 11 months ago

Thanks @salgo60! This is great. I will look into your links a bit further when I have time. Away tomorrow for a conference.

salgo60 commented 11 months ago

Thanks @salgo60! This is great. I will look into your links a bit further when I have time. Away tomorrow for a conference.

@monirbounadi

dont miss to be part of Wikidata modelling day

I wrote about the problems we see with bad research data and lack of good modelling on sv:Wikipedia and also some good links to understand Wikidata

monirbounadi commented 11 months ago

I am soon done with this. A bit tedious but it has been educational to do this during a number of evenings. One observation I want to put out here already though is that it seems to me that Wikidata has more accurate birth and death dates now than most other secondary sources that I have looked at (e.g. the book "Tvåkammar-riksdagen 1867-1970" or the "Swedish Death Index 8"). The reason is that their dates have been directly checked with the vital registers in Sweden (e.g., birth and death books) for unclear MPs. The same goes with the birthplaces. In fact, I have found many errors in "Tvåkammar-riksdagen 1867-1970" regarding the birthplace. It seems to me that in some cases the birthplace given in "Tvåkammar-riksdagen 1867-1970" coincides with the places of living when the MP was about 1 year old, which is not necessarily the place of birth. There are plenty of opposite cases where an MP is born in for example a city such as Stockholm, as given in "Tvåkammar-riksdagen 1867-1970", but it is very hard or impossible to find the MP in early household examination records in the given city. In these cases, it seems like the parents of the MP simply used facilities/people in a city for the conception and christening.

Just by looking at these cases, it seems to me that the parents of many of the MPs were extremely mobile relative to the rest of the population. (I understand it is not a priority for you to do life-course analysis, but I think it is descriptively quite interesting.)

salgo60 commented 11 months ago

A bit tedious

@monirbounadi There is a gadget so that you can drag and drop from Wikipedia se video

My common,js file that you can copy and create your own common.js see also #123

image

image

fredrik1984 commented 11 months ago

Thank you @monirbounadi for your thorough work! Very good to know about this, and the flaws in the bio books.

monirbounadi commented 11 months ago

Thanks @salgo60! That is very helpful.

The tedious part mostly comes from double-checking with household examination records, especially for unclear cases.

salgo60 commented 11 months ago

Thanks @salgo60! That is very helpful.

The tedious part mostly comes from double-checking with household examination records, especially for unclear cases.

That is why you should i WIkidata make it easier for the next person to check the sources and add an url to the free church book... I have written about how dysfunctional I feel Riksarkivet works feels more a waste of tax payers money

Why is it important with the location for you? what is your user case

What we miss in Wikidata is a good source for historical places with persistent identifiers it was a good try at Riksarkivet Tora but I feel they gave up...(I created property P4820)

FYI I changed your edit on Q5966042

You had on that item SBL that also had info about birth and death place.... they use text strings and miss something like Wikidata but has a lot of quality checks...

OT jag testade med chatGPT att jmf olika Wikipedia språks artiklar och vilka källor dom använt

monirbounadi commented 11 months ago

The birth location is important since birth dates and birthplaces are often unique identifiers in Swedish censuses. Using birth dates and birthplaces we can write a program that connects the MPs to the censuses 1880-1910 that have been digitized by the National Archives of Sweden. Does that make sense?

Wikidata is a good source for parishes in Sweden. The only issue as I see it is that historical dioceses are not recorded in the Wikidata entries for the parishes.

That is great. Editing my edits should improve the data!

Using chatGPT for that seems sensible to me!

monirbounadi commented 11 months ago

Okay, so now I've added birthplaces to all missing posts. I have also corrected a number of errors, changed some birthdates, added death dates for all dead MPs, added death places for most of the death MPs, made some research in primary sources and updated accordingly, etc. The quality should be of very high quality now but can of course be improved.

This issue can be closed.

To be clear: I don't know if I can close this issue. If someone else can, please do so. Otherwise, let me know how I can do it. Thanks.

salgo60 commented 11 months ago

Wikidata is a good source for parishes in Sweden. The only issue as I see it is that historical dioceses are not recorded in the Wikidata entries for the parishes.

They are please give one example

see. how we have matched NAD with Wikidata Svenskaforsamlingar

monirbounadi commented 11 months ago

I think the parishes that previously belonged to Kalmar diocese are simply said to be part of Växjö diocese since Kalmar diocese got subsumed into Växjö diocese. So it is hard to infer which parishes previously belonged to Kalmar diocese.

If I remember correctly some diocesan boundaries have also changed over time. For example, Stockholm diocese took over some parishes from Uppsala diocese upon its creation. For example, Brännkyrka congregation (https://www.wikidata.org/wiki/Q10436974) belonged to Uppsala stift before 1942, which is not noted in the entry for Brännkyrka congregation.

This is the issue with the Wikidata entries on the parishes:

If parish X belongs to Växjö stift, did it at some point belong to Kalmar stift before 1915 or has it always belonged to Växjö stift?

If parish X belongs to Stockholms stift, did it belong to Uppsala stift or Strängnäs stift before 1942?

monirbounadi commented 11 months ago

I now see how I can close this issue, so I'll close it (see above).

salgo60 commented 11 months ago

example, Brännkyrka congregation (https://www.wikidata.org/wiki/Q10436974) belonged to Uppsala stift before 1942, which is not noted in the entry for Brännkyrka congregation.

The relationship church parish "stift" I think is not in wikidata and dont know if we need it what you have is replaced by P1366

image
monirbounadi commented 11 months ago

Isn't stift/diocese given by values on property P708? "Brännkyrka församling" (Q10436974) has value "Stockholms stift" on P708. But it should say that it has that value for a period after 1941.

monirbounadi commented 11 months ago

This problem occured to me when I realized that there are two historical congregations called "Ryssby församling". One lied in Växjö stift. Another lied in Kalmar stift. But now they both lie in Växjö stift! So how to differentiate them using the Wikidata entries? Not easy. We have to take the "kontrakt" into account. Here they are:

https://www.wikidata.org/wiki/Q10658311 https://www.wikidata.org/wiki/Q10658310

The problem then is that when strings actually uniquely identify things, such as "Ryssby församling" + "Växjö stift" in 1900 and "Ryssby församling" + "Kalmar stift" in 1900, Wikidata cannot as of yet handle that.

salgo60 commented 11 months ago

! So how to differentiate them using the Wikidata entries? Not easy. We have to take the "kontrakt" into account. Here they ar

But wikidata never use Strings that is the idea of Linked data

Wikipedia is old school and use unique text strings to disambiguate

Looks like stift is added Q10658311

image

One interesting design pattern is

You can filter by clicking to the right

image

I did a SPARQL and all WD objects with wd:Q615980 has P708

monirbounadi commented 11 months ago

My point is not that a diocese is not recorded for each parish.

My point is the following:

A parish can belong to different dioceses across time. Currently, on WD, only the current diocese is given (as I understand it) with no specification of during which time period the parish belonged to that diocese. This means that for people who work with historical registers of parishes in the 1800s, the WD entries will be less useful than for those who work with similar registers in the early 2000s.

salgo60 commented 11 months ago

My point is not that a diocese is not recorded for each parish.

My hope is the lesson learned from you using Wikidata is NEVER USE TEXT STRINGS

The next step I think is that you should not use Wikidata instead create your own Linked data that is trusted... the very big challenge is that Wikidata has been alive > 10 years and have a technical good product owner in Lydia

image
salgo60 commented 11 months ago

A parish can belong to different dioceses across time. Currently, on WD, only the current diocese is given (as I understand it) with no specification of during which time period the parish belonged to that diocese. This means that for people who work with historical registers of parishes in the 1800s, the WD entries will be less useful than for those who work with similar registers in the early 2000s.

Do you have a good source for those statements? Then its just easy to add it....

Currently, on WD, only the current diocese is given

Maybe.... as shown above its no problem with the Wikidata model to add qualifier start end and also say that this is just for part "FOLKBOKFÖRING" etc...

The big challenge I see for humaniora research is that I feel


The big step I see with this project


What I lack