swerik-project / riksdagen-persons

A repository for metadata on politicians who participate in the Riksdag.
0 stars 1 forks source link

Add profession metadata to the MP database #26

Open fredrik1984 opened 1 month ago

fredrik1984 commented 1 month ago

Profession is a metadata category that would be good to eventually add to the MP database.

Agustin Goenaga (Lund University) and his project colleagues have added profession for MPs in the 19th century within their project: https://portal.research.lu.se/sv/projects/the-politics-of-state-building-studying-investments-in-state-capa

They have used the Swerik MP database and we are welcome to integrate their profession metadata into our MP database. We have got two CSV files, attached here, with this metadata. Agustin gave the following description of these two files:

  1. “Speakers_19th.csv” includes the “raw” profession as coded by our RA based mainly on the biographical dictionaries and Wikipedia.
  2. “mp_metadata_districtcodes.csv” includes the same profession variable, as well as additional dummy variables that we created in which we tried to systematize some of the professions into more general occupational groups. MPs may have values of 1 in multiple occupational groups if they were listed as having multiple “professions” in the dictionaries. I’m not sure how systematic the RA was when coding more than one occupation, since her instructions were to code the first or more prominent one, but at least we can trace back to the original professions in the “raw” variable. The speaker_id variable is the same as your wiki_id.

Hence, we should integrate Agustin's MP profession metadata into Swerik's MP database.

mp_metadata_districtcodes[95].csv speakers_19th[37].csv

MansMeg commented 1 month ago

I think we should discuss what should be included in the database. I think profession is borderline since it doesnt connect to mps role.

I think we can help adding it to wikidata instead. Then people can use it by connecting to wikidata.

fredrik1984 commented 1 month ago

That is true, having it on Wikidata is more suitable.

BobBorges commented 1 month ago

What level of urgency are we talking about here?

Assuming each profession already has a Q-number on wikidata, it would be straightforward to add the professions in the speakers_19th... file to wikidata. However (!) all claims added to wikidata should have a source -- there is no source indicated in the csv file.

Further, adding to wikidata is one thing, but if we do it (and we have source information), we should also police those entries in the same way we do with other parts of our data set -- have a profession.csv file with columns ["person_id", "start", "end", "profession"] and test that our additions are not deleted/overwritten with every pull.

I agree with @MansMeg that this is a borderline case, whether or not to include this variable, however, I think it is relevant to the overall set of potential use-cases for the corpus -- i.e. the professions of parliamentarians (just glancing at the sheet) are not proportionally representative (I would guess) of all professions in the country; why? how does that affect political discourse?

If there are reasonable sources for the professions data, I think it would be a useful variable to introduce to Swerik, via wikidata and the other workflows that we use.

fredrik1984 commented 1 month ago

I don't see this as an urgent issue. Mostly because of the borderline case of this metadata category. I think that the ideal case would be if a researcher would be interested in adding this information to Wikidata for their own purpose.

I think Agustin told me that they used the bio books to get the profession data. But that needs to be confirmed,

Maybe we should just keep this issue, with the CSV files in it, in the backlog. We have more important stuff to do at the moment (and most likely in the future too).

salgo60 commented 4 weeks ago

FYI: I did a check Aug 14, 2021 of the status of professions in Alvin, Arken, Historiska museet, Kungliga biblioteket - LIBRISXL, Levande musikarv, Litteraturbanken , Svenskt Kvinnobiografiskt lexikon, SBL see GITHUBsalgo60/HISCOKoder I also tried to get more people to care about occupations see question #1 - no reaction 😃

The major problem I feel is that we dont see many sources using HISCO codes - history of works its just text strings without good definitions...

Sources that has HISCOcodes

Example output...

image

SWERIK connected persons professions in Wikidata its a mess BUT

image image image
salgo60 commented 4 weeks ago

Wikidata profession of father of person connected to SWERIK

image image image

Wikidata doesnt document workers very good

Just 3 sons of a worker/arbetare have we found made it to be a Swedish PM

image

Wikidata profession of fathers father of person connected to SWERIK

image image image
salgo60 commented 3 weeks ago

That is true, having it on Wikidata is more suitable.

Wikidata is just a POC - proof of concept... use it but quality assure it and add better sources is my take...

We can see that RAÄ have given up and now also use Wikidata see FB project description I can see a red flag that they are not on GITHUB....

image

Research project that looks cool Wikidata:WikiProject Reference Verification

Install importScript('User:1hangzhao/ProVe.js');

image image