welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Fix missing data on wikidata #282

Closed BobBorges closed 4 months ago

BobBorges commented 1 year ago

There are now unit tests on the mpqc branch to test our metadata (corpus/metatdata/) against the work of @emil and @salgo60 (wrangled and cleaned at corpus/quality_assessment/known_mps/catalog.csv). Unit tests check (a) integrity of the catalog.csv file (no NAs) and (b) whether all those people from catalog.csv are represented in our metadata. The following things need to be corrected for the MP quality control (issue #265).

integrity of catalog (unit test result files start with integrity-error_ in corpus/quality_assessment/known_mps/)

missing metadata (unit test result files start with missing_ in the same directory)

OBS. it seems like individuals missing the member of parliament attribute ought to be prioritized because it causes at least some of the other missing info, since this attribute is how we get our metadata.

MansMeg commented 1 year ago

So what is the minimal step/fix needed to pass the first unit test for persons.csv?

BobBorges commented 1 year ago

Added birthdays for MPs who didn't have a birthday (@salgo60 FYI):

fredrik1984 commented 1 year ago

Great job, @BobBorges! Question: what is "location specifier" in your checklist above? Is it iort?

BobBorges commented 1 year ago

@fredrik1984 It seems like it should be. It comes from the "alias" part of wiki_data. image

MansMeg commented 1 year ago

Yes. I realize we should clean this up long term now when @salgo60 has created the "I riksdagen kallad" object. Then we should probably update Wikidata with your script @BobBorges . I open a new issue on this.

BobBorges commented 1 year ago

Now also source (bio books) on wikidata for each birth date added in the post above.