Open fredrik1984 opened 1 month ago
I think we should discuss what should be included in the database. I think profession is borderline since it doesnt connect to mps role.
I think we can help adding it to wikidata instead. Then people can use it by connecting to wikidata.
That is true, having it on Wikidata is more suitable.
What level of urgency are we talking about here?
Assuming each profession already has a Q-number on wikidata, it would be straightforward to add the professions in the speakers_19th... file to wikidata. However (!) all claims added to wikidata should have a source -- there is no source indicated in the csv file.
Further, adding to wikidata is one thing, but if we do it (and we have source information), we should also police those entries in the same way we do with other parts of our data set -- have a profession.csv
file with columns ["person_id", "start", "end", "profession"]
and test that our additions are not deleted/overwritten with every pull.
I agree with @MansMeg that this is a borderline case, whether or not to include this variable, however, I think it is relevant to the overall set of potential use-cases for the corpus -- i.e. the professions of parliamentarians (just glancing at the sheet) are not proportionally representative (I would guess) of all professions in the country; why? how does that affect political discourse?
If there are reasonable sources for the professions data, I think it would be a useful variable to introduce to Swerik, via wikidata and the other workflows that we use.
I don't see this as an urgent issue. Mostly because of the borderline case of this metadata category. I think that the ideal case would be if a researcher would be interested in adding this information to Wikidata for their own purpose.
I think Agustin told me that they used the bio books to get the profession data. But that needs to be confirmed,
Maybe we should just keep this issue, with the CSV files in it, in the backlog. We have more important stuff to do at the moment (and most likely in the future too).
FYI: I did a check Aug 14, 2021 of the status of professions in Alvin, Arken, Historiska museet, Kungliga biblioteket - LIBRISXL, Levande musikarv, Litteraturbanken , Svenskt Kvinnobiografiskt lexikon, SBL see GITHUBsalgo60/HISCOKoder I also tried to get more people to care about occupations see question #1 - no reaction 😃
The major problem I feel is that we dont see many sources using HISCO codes - history of works its just text strings without good definitions...
Sources that has HISCOcodes
Example output...
SWERIK connected persons professions in Wikidata its a mess BUT
That is true, having it on Wikidata is more suitable.
Wikidata is just a POC - proof of concept... use it but quality assure it and add better sources is my take...
We can see that RAÄ have given up and now also use Wikidata see FB project description I can see a red flag that they are not on GITHUB....
Research project that looks cool Wikidata:WikiProject Reference Verification
Install importScript('User:1hangzhao/ProVe.js');
Profession is a metadata category that would be good to eventually add to the MP database.
Agustin Goenaga (Lund University) and his project colleagues have added profession for MPs in the 19th century within their project: https://portal.research.lu.se/sv/projects/the-politics-of-state-building-studying-investments-in-state-capa
They have used the Swerik MP database and we are welcome to integrate their profession metadata into our MP database. We have got two CSV files, attached here, with this metadata. Agustin gave the following description of these two files:
Hence, we should integrate Agustin's MP profession metadata into Swerik's MP database.
mp_metadata_districtcodes[95].csv speakers_19th[37].csv