welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Person catalog entities with no "role" #451

Open BobBorges opened 6 months ago

BobBorges commented 6 months ago

description

We recently implemented a SWERIK person ID, a unique persistent identifier assigned to person entities present in the corpus. The ID serves as a primary key in the MP database, and IDs resolve to swerik-project.github.io/person-catalog/{{ID}}/.

The queries that we use to fetch MP data from wikidata work based on a person's role in Riksdag, similarly, construction of the person catalog where the IDs can be resolved is also built around the role.

However, there are some 50 individuals in the dataset who don't have one of the relevant roles (Member of Parliament, Minister, Speaker), which means their IDs were not added to the person catalog (but are on Wikidata) and don't resolve properly on our website.

the task

There is a list (attached) of SWERIK IDs that were generated, but not implemented in the catalog. Look these people up (start with wikidata, then maybe check other sources) --

-- for each SWERIK ID, you can find the corresponding wiki ID in the file "corpus/metadata/wiki_id.csv"

-- is such a role (MP Minister Speaker) missing? It will be missing from wikidata, otherwise we would have it, but maybe they should have such a role according to bio books, riksdagen open data?

-- For those who don't seem to have any missing role:

catalog_problem-ids.txt

salgo60 commented 6 months ago

I did a small check and wd is inconsistent

i-SoPKUW6bamDYhSJ8r5kbfm = Q6070153

--> was part of Regeringen Ekman I (Q10650421)

image image
image
image
BobBorges commented 5 months ago

So the people come from the government query, this is good info. Now we need to decide if there are other "roles" that need to be included in our dataset. @salgo60 it will be a job for our new crew of student assistants.

salgo60 commented 5 months ago

I used the wd hub tool to make the list a little bit easier to analyze. I feel you should step up and 1) use language tags in your json, now you mix languages in an odd way I guess if you dont understand Swedish everything looks odd... 1) never use just text strings always use identifiers in combination 1) always have an easy way to see the version history of your object --> its always a nightmare to track when Riksarkivet SBL has changed something see "the magnus list' 1) better term names i.e. if the identifier is from wikidata call it wdplaceid or something like that 1) your alvin referencies links are wrong in your json data e.g. i-8fp2gLHtWqnYwFqeVLu5w8 1) if you have a term that is same as a wikidata property then say same as P20 1) compare owl:equivalentProperty P1628 equivalent property 1) if it is not exact match then use SKOS - skos:closeMatch, skos:exactMatch, owl:sameAs, owl:equivalentClass, owl:equivalentProperty

1) template Mall:Auktoritetsdata is used on > 130 000 pages in svWikipedia in sv:wikipedia and if your identifier will be stable and used a lot I suggest that you will be part of that template.... see petscan search template used on sv:wikipedia and SWERIK ID (P12192) = 2710

1) support Structured data markup to be better findable by google search, example google test tool on https://swerik-project.github.io/person-catalog/i-X9jqcGenibr5tWEbnZgddm - No items detected - compare sv:wikipedia - 1 valid item detected see section Detected items

image image

example what wikidata has regarding swedish properties and political properties

image image image image

list with no role and using wd prop P12192 together with wd hub tool