Setup person identifier for persons speaking in the parliament

MansMeg commented 2 years ago

After 0.3 there has been discussion on finding a way to store and handle the member of parliament, ministers etc in a structure that is easy to handle, but also flexible. Also so we can get the data on parties etc for ministers etc.

Below is how the Riksdagens Öppna data suggest that the persons are stored. Although, to me, this looks more like how to store the parliamentarians for a certain mandate period, than is not necessarily sufficient for us. Although, I think this is very similar to how we now store this data. I also think these tables are a good starting point for us.

Some problems with the current (and the Riksdagen open data are):

MoP can change the name
MoP can change party
MoP can change gender

In the long term, we would like to handle this with separate tables (such as mop_names.csv, mop_party.csv etc). But for now we just want to be able to connect between the different files we now have and know that this is the same persons. We also want to connect this to the wiki data identifiers.

Suggestion

[ ] We add a new csv_file called persons_parliament.csv with one row per individual person that are ever talking in the parliament. the only thing I can think of that cannot be changed over time in the metadata for persons are: unique identifiers, place of birth, birthdate, place of death, dead date. Hence I think the csv-file should have the following columns:
```
person_id; birth_date; place_of_birth; death_date; place_of_death; wikidata_id; rikdagen_id; [further ids]...
...
```
[ ] Then we add the variable person_id in the three files we now have. Then we could through these IDs add metadata on party and gender in the other files for now 8between the files), and then later on we could start to formalize the storage of the other variables.

The Riksdagen Open Data suggested format:

CREATE TABLE person (
intressent_id varchar(20),
född_år smallint,
kön varchar(6),
efternamn nvarchar(50),
tilltalsnamn nvarchar(50),
sorteringsnamn varchar(80),
iort varchar(40),
parti varchar(40),
valkrets varchar(50),
status varchar(100),
);

CREATE TABLE personuppdrag (
organ_kod varchar(20),
roll_kod varchar(40),
ordningsnummer int,
status varchar(20),
typ varchar(20),
[from] datetime,
tom datetime,
uppgift varchar(500),
intressent_id varchar(50)
);

CREATE TABLE personuppgift (
uppgift_kod varchar(50),
uppgift ntext,
uppgift_typ varchar(50),
intressent_id varchar(50)
);

MansMeg commented 2 years ago

I would be interested in opinions on this from @TomasSkotare , @ninpnin , @rbbby and maybe also from @ljo and @salgo60 that also have been part of this general discussion.

salgo60 commented 2 years ago

I met the Swedish Parlament people 2019 and they have some technical debts

In Wikidata we use ShEx for defining schemas used (see video when designing the one for the Swedish PM EntitySchema:E134).

One good starting point is to look at how the Wikidata project WikiProject_British_Politicians has defined PMs the person doing most of the work is Andrew Gray twitter @generalising

they have a sample query page were you can see what queries they can ask
example in WIkidata of a "politisk vilde" Amineh Kakabaveh --> Wikidata Q3675519 then she get parliamentary group "no value"

example of positions "Swedish cabinet minister" = Q10686171 --> we can create list see Talk:Q10686171 same but for Minister for Justice same but for Sweden's Minister for Finance list of positions we have in Wikidata for Swedish PM en sv list of parties we have in Wikidata for Swedish PM en sv

5 star data "same as"

I think the project should look into Linked data and graph databases listen to Tim Berners Lee The next web were he push 5 star data ==> you should have "same as" e.g. Wikidata, Swedish National Archives etc,,,, compare how Litteraturbanken August Strindberg has a section of data they retrieve from Wikidata

I also think using graph databases will open up new ML possibilities see "Graph-Powered Machine Learning"
- sbl_link is Swedish National Archive e.g. 34518 is Strindberg
- wikidata_id Q7724 as a graph
- ...

graph

salgo60 commented 2 years ago

Alias name Depends how you will use the data but many people in the Swedish PM had "other names" to disambiguate common names see SPA sj9PGLAlnmUAAAAAABgfbg

Alf Andersson
Andersson i Essvik
Alf Emmanuel Andersson

In Wikidata we have one name and many alias for every objekt --> Q5553830, fi, en, de, json

ninpnin commented 2 years ago

@salgo60 How comprehensive is this coverage? We rely on those extra identifiers pretty heavily in the process, as the introductions are often just "Herr X i Y:" in a lot of cases.

salgo60 commented 2 years ago

@ninpin my best guess is that we have it in Swedish Wikipedia articles but less often in Wikidata see video how you could extract them from python, java, javascript... (all Wikidata is CC-0)

query Wikidata people in Swedish PM with one or more alias

My plan is that we should have everyone in Wikidata

@ninpnin if you could give me a list of candidates I could update Wikidata

In my video I mentioned bionomia / how it works that is using WIkidata to match Specimen data with the biologist that found the species i.e. they have the same challenge to find what person is behind a signature and he use WIkidata alias see below

How other works with Wikidata Good presentation how Bionomia are synching its data with Wikidata, ORCID (living persons) and GBIF (biodiversity data) and use Wikidata as a good resource but use wikidata at arm's length

LD4 - video presentation starts at 05:00 Keepin 'N Sync... with wikidata ... and ORCID...and GBIF - slides av David Shorthouse

dpriskorn commented 2 years ago

I agree with Magnus that the aliases are a powerful tool to capture one thing being mentioned in multiple ways. The beautiful thing with Wikidata is that it is open for anyone to improve. So you can enter aliases there as you find them. If you are unsure about who they mean in the source you can compile a list and we can investigate together. We could even create an unknown person named "X from Y" in Wikidata and merge later with any of the known MPs once we find out who it is.

I suggest using https://www.wikidata.org/wiki/Property:P2561 to enter the name also (apart for adding it to alias) and add a reference to exactly where it shows up. That makes it easier for everyone to investigate anytime.

I try to think of names as a human friendly identifier. Unfortunately they are pretty bad for machines and big societies where collisions can easily occur when multiple people have exactly the same name.

MansMeg commented 2 years ago

This sounds promising. I tried to go through some of your links @salgo60 , although it was a little too much information for me to digest and make actionable.

So most computational researchers are familiar with csv-files, JSON and (some) XML. Hence I’m leaning toward storing persons in a csv as presented above. What do you think about that? To me that would be easy to sync with wikidata? Or do you have another suggestion? JSON?

Also, regarding Aliases, I agree it make sense to store these as well. Again for us it make sense to store it as a csv with a structure something like this (columns): person_id; alias_name; from_date; to_date

Any thoughts on this suggestion?

salgo60 commented 2 years ago

OT this is what Swedish Datastory does with Wikidata PM data see "The Longest Serving MP in Sweden" - tweet

They have done a lot of work updating Wikidata for people, documents from Riksdagen after 1971

More Datastory/Wikidata ** How we're tracking elections in symbiosis with Wikidata

Disambiguate names of Swedish PM people As long as Wikidata is not perfect and have all "special names" as alias we can test do the following

e.g. Andersson i Myggenäs
- use a text retrieval in sv:Wikipedia
  - you can use the API srwhat=text&srsearch= --> Andersson i Myggenäs
Mossberg i Blomma --> api srsearch --> Wikipedia Albert Mossberg

I checked the amount of pictures of Swedish PMs in portrattarkiv.se see Notebook > 5000 pictures so that can also help searching in...

salgo60 commented 2 years ago

@ninpnin just so we are on the same page

in this picture it says "Magnuson i Sandviken"

Question ninpnin: is this something this person was called in the Swedish PM? and that we should add as a "Tunnetaan myös nimellä" to his ruotsi Wikidata Q5971868?

Ps. a new video was published with nearly the same user case Wikidata and OCCRP (WikidataCon 2021 recording)

OCCRP is Organized Crime and Corruption
User case track criminals names in different name forms in different languages using WIkidata. In WIkidata we also split the name as given name P735 and family name P734 and have different items for those names in multi language forms ex. Tord = Q1750352 that is spelled 托德 in chinese

Another tool to find Swedish PMs is Wiki template Mall:Ledamöter_av_Sveriges_riksdag

salgo60 commented 2 years ago

@MansMeg I looked into the spec TEI Schema for Corpora of Parliamentary Proceedings and I guess a good start is "standardize" objects they mention --> describe them as linked data with a persistent identifier with a landing page, try to standardize this for Digital Humaniora that works with Parliamentary Proceedings in the whole Europe

I spent some time last week doing a nice table of new ministers in the German Government Scholz cabinet in sv:Wikipedia "Regeringen Scholz" and I can see that it could be a challenge to model objects over time an in the whole Europe but doing this we could start compare countries in a much better way so I guess its the way forward... --> that means that you Digital Humanist needs start to modell things together as we do in Wikidata to make different language version of Wikipedia to scale better....

....

Low hanging fruits I guess are people in ministers like

ministers/falldin_iii.csv
same as
- en:Wikipedia Fälldin_III_Cabinet
- sv:Wikipedia Fälldin_III
- wikidata Q5306186
- SPARQL
  - P5054 = cabinet
  - Q5306186 = "Fälldin III Cabinet"
    - SPARQL sv fi

ninpnin commented 2 years ago

@salgo60 Yes, I think, adding 'Magnusson i Sandviken' as his alias is appropriate, in the field you suggest.

I didn't manage to find him in our data as we only have data from 1920 at the moment, but here's an example.

People are introduced with that exact alias. In the transcribed speeches, too, you will see people referred as 'Magnusson i Sandviken' or 'Anderson i Rasjön'. So the also known as/tunnetaan myös nimellä field is 100% appropriate for this type of metadata.

MansMeg commented 2 years ago

Thanks @salgo60 . I interpret your response as this type of CSV file with the different objects coming from the Parla-Clarin format is a good one that easily can be combined with the wikidata structures.

What do @ninpnin think about having alias as a csv/tabular file? I know we have discussed this before:

person_id; alias_name; from_date; to_date

salgo60 commented 2 years ago

Dont hesitate to call me 073-5152802 this is very complex but I also think game changing.... I am also on Telegram as salgo60

I didn't manage to find him in our data as we only have data from 1920 at the moment, but here's an example.

Thanks I plan to call the Swedish Riksdagens library Lotta Åberg Brorsson when they open (video with her from 2018). I guess she is more skilled on those names.... I also asked a person FBQ on sv:Wikipedia link and we found "scanned books PortraitCatalog:Tvåkammar-riksdagen 1867-" with 4500 people in the second chamber were everyone has a "special name". FBQ and I thought it was a little bit odd.... but as you said add them to the Wikidata as Alias --> will help when doing NER on names

Suggested work process as @dpriskorn suggested maybe one approach can be

have a list of people in the Swedish PM and the "alias" you find
match them if possible to Wikidata Qnumber
if no match found then we can create a Wikidata stub i.e. just an object that have what we know, when active in the Swedish PM etc.
1. Mark the object so we easy find it and can curate it later and also in your list mark it as a WD stub with Qnumber xxx
2. When we find who it is we merge it with the better object

As mentioned I found eg, Lista över ledamöter av Sveriges riksdags andra kammare 1914 who was active in 1914 second chamber and as you can see they have the same "problem"

that some people are not linked to the Wikipedia/Wikidata article - that exist eg. Evald Chrispin Kropp was in the list but not linked (se version history) to his Wikipedia article Evald Chrispin Kropp
that some people mentioned has no Wikipedia/Wikidata article (is on my never ending Todo list)....
....

See loooong video I did about this ;-) in the video I play with browser plug-in Wikidata:Entity_Explosion

salgo60 commented 2 years ago

person_id; alias_name; from_date; to_date

Depends what scope you have my understanding from the Bionomia developer listen at 48:28 when I asked him about his experience matching --> its normal a High Chaparral doing NER for notes about scientific findings....

In Wikidata we have alias per language
- here is a list of other name related properties in Wikidata [sv] [fi]
I guess Riksdagen is "good structured" so then you have a minor number of "name forms"
- Lotta Åberg Brorsson Riksdagens Library mentioned different spelling over time and OCR problems see video 6:39

ps. I also asked on the sv:Wikipedia discussion page about Riksdagens name forms "Bybrunnen#I_Riksdagen_kallad_Pettersson_i_Bjälbo,_Petersson_i_Röstånga"

salgo60 commented 2 years ago

Looks it was very usual in the old days in the Swedish Government to use this above name form by the "talman" and still today it can be used but mostly as an humoristic way was the answer to my question

Todays examples were SD has 2 people with the "same" name Jonas Andersson

WD Q58837098
- movie 1 juni 2021 when the "talman" speaks to him
- web page Riksdagen "Jonas Andersson i Linghem (SD)"
- Json Riksdagens data
WD Q59387749
- movie 23 september 2021 when the "talman" speaks to him
- web page Riksdagen "Jonas Andersson i Skellefteå (SD)"
- Json Riksdagens data

salgo60 commented 2 years ago

FYI I created a Wikidata request for a new property to store the name in the Swedish PM link

This process can take some weeks...

If you support this please create a Wikipedia account (sv) (fi) and add a positive vote

syntax for positive vote is

{{S}} - ~~~~

Update looks like we can support it in another way as an user suggested

SPARQL query with pictures ** example Q6043480#P2561

salgo60 commented 2 years ago

We created a sv;WIkipedia article about this name form in the Swedish PM see I_riksdagen_kallad (updated now a person deleted the table so you have it on this page)

the last part is dynamic and generated from Wikidata i.e. it is the status what we have done so far....

(update: I added a reference to this project but it was deleted ;-) see early version)

ninpnin commented 2 years ago

Here are some Python wrappers we might want to use for querying Wikidata

QWikidata https://github.com/kensho-technologies/qwikidata
Wikirepo https://github.com/andrewtavis/wikirepo
Wikidata https://github.com/dahlia/wikidata

salgo60 commented 2 years ago

@ninpnin dont hesitate to call me if you have questions +46-735152802 or better screen sharing or telegram salgo60

I often use the SPARQL editor generated code and sparqlwrapper see video then you get Python code generated... --> see video where I use the following SPARQL query -->

==> code

# pip install sparqlwrapper
# https://rdflib.github.io/sparqlwrapper/

import sys
from SPARQLWrapper import SPARQLWrapper, JSON

endpoint_url = "https://query.wikidata.org/sparql"

query = """#title:  Ledamöter med "samma I Riksdagen kallad"
SELECT DISTINCT ?nameUsedinSwedishPM1 
(SAMPLE(?svWikipedia1) AS ?svWikipedia1) (SAMPLE(?svWikipedia2) AS ?svWikipedia2) 
(SAMPLE(?person1) AS ?person1) 
(SAMPLE(?person2) AS ?person2)
WHERE {
  ?person1 p:P2561 ?nameSwedishPMp1.
  ?person2 p:P2561 ?nameSwedishPMp2.
  {
    ?nameSwedishPMp1 ps:P2561 ?nameUsedinSwedishPM1;
      pq:P3831 wd:Q110382440.
  }
  {
    ?nameSwedishPMp2 ps:P2561 ?nameUsedinSwedishPM2;
      pq:P3831 wd:Q110382440.
  }
  FILTER((?nameUsedinSwedishPM1 = ?nameUsedinSwedishPM2) && (?person1 != ?person2) 
        && (str(?person1) > str(?person2))
        )
  SERVICE wikibase:label { bd:serviceParam wikibase:language "sv,en". }
  OPTIONAL {
    ?svWikipedia1 schema:about ?person1;
      schema:inLanguage "sv";
      schema:isPartOf <https://sv.wikipedia.org/>.
  }
  OPTIONAL {
    ?svWikipedia2 schema:about ?person2;
      schema:inLanguage "sv";
      schema:isPartOf <https://sv.wikipedia.org/>.
  }
}
GROUP BY ?nameUsedinSwedishPM1 ?person1 ?person1Label
ORDER BY (?nameUsedinSwedishPM1)"""

def get_results(endpoint_url, query):
    user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
    # TODO adjust user agent; see https://w.wiki/CX6
    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()

results = get_results(endpoint_url, query)

for result in results["results"]["bindings"]:
    print(result)

I use pandas a lot and has changed the code to get the data returned into pandas see Notebook example function


def get_sparql_dataframe(endpoint_url, query):
    """
    Helper function to convert SPARQL results into a Pandas data frame.
    """
    user_agent = "salgo60/%s.%s" % (sys.version_info[0], sys.version_info[1])

    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    result = sparql.query()

    processed_results = json.load(result.response)
    cols = processed_results['head']['vars']

    out = []
    for row in processed_results['results']['bindings']:
        item = []
        for c in cols:
            item.append(row.get(c, {}).get('value'))
        out.append(item)

    return pd.DataFrame(out, columns=cols)

salgo60 commented 2 years ago

@ninpnin let me know if I should prioritize what people I curate in Wikidata with "I Riksdagen kallad" right now I just take them randomly. Maybe it makes more sense for you that we take people active at a specific year?

In sv:Wikipedia we have some lists (quality unknown as always with Wikipedia)

example 1929–1932
see template Ledamöter av Sveriges riksdag

dpriskorn commented 2 years ago

Here are some Python wrappers we might want to use for querying Wikidata
* QWikidata https://github.com/kensho-technologies/qwikidata

* Wikirepo https://github.com/andrewtavis/wikirepo

* Wikidata https://github.com/dahlia/wikidata

I have not tested these 3 but the best library I found until now (that is professionally maintained and covers all of Wikibase) is https://github.com/LeMyst/WikibaseIntegrator. It is very powerful and v0.12 has very nice API:s IMO. There are notebooks that showcase how to use it.

(I have contributed code and code review to the project)

rbbby commented 2 years ago

I am currently working on individual level data for this issue. For some people we have multiple birth and death dates which I thought would be of interest. I looked through some references and often it is these that have conflicting information. This is generally not a problem as researchers at most will use this information on a year level, in which case there are only 2 conflicts. The problematic ones are birth data for Q18202339 (differs by 11 years) and death date for Q5718571 (1 of 2 dates has reference missing). The complete list of conflicting information is given below in case it is of further interest:

Multiple birth dates: Wikidata: ['Q18202339', 'Q4947860', 'Q5613770', 'Q5630560', 'Q5782765', 'Q5784568', 'Q5820037', 'Q5943976', 'Q5968645', 'Q6078640', 'Q6228020']

Multiple death dates: Wikidata: ['Q5556026', 'Q5563972', 'Q5718571', 'Q5799761', 'Q5937709', 'Q6022275', 'Q6042602', 'Q6228020', 'Q728197']

rbbby commented 2 years ago

Also found these wikidata objects with missing start and end dates. That is dates for starting and ending the property position held (P39), taking any of the values: Q10655178, Q33071890, Q81531912 (member of enkammarriksdagen, första kammaren andra andra kammaren). Currently active members of parliament (which do not have end dates yet) are not included in these lists.

Missing start: ['Q5819783', 'Q4983135', 'Q98271639', 'Q4976825', 'Q6210385', 'Q4934552', 'Q19976148', 'Q4957371', 'Q5950466', 'Q110279970', 'Q5547315', 'Q5553916', 'Q4963592', 'Q4970175', 'Q98538839', 'Q5599215']

Missing end: ['Q98556536', 'Q98539283', 'Q98937482', 'Q98937434', 'Q98317372', 'Q97971262', 'Q97971276', 'Q98668554', 'Q6196285', 'Q16084072', 'Q5938531', 'Q5577470', 'Q98556565', 'Q98668809', 'Q5547542', 'Q5621600']

salgo60 commented 2 years ago

@rbbby

Thanks I have started a slow cleaning of Wikidata and adding "Tvåkammar-riksdagen 1867-1970" as a source plus adding Iriksdagenkallad see list

Question 1 what years are most important for you? I guess 1867–2021

FYI: there is also a discussion how to redesign sv:WIkipedia see WD-mall_riksdagsledamot

Question 2 the result from this project "riksdagen-corpus" can it be used to link from sv:Wikipedia? Is it described? I would be nice to have "landing pages" per

Data issues Wikidata reported by I try to walk through the list see Feedback rbby

Nota bene issue https://github.com/salgo60/Wikidata_riksdagen-corpus/issues/9 was a Wikidata error maybe done with good intentions BUT an open platform like Wikidata must have good sources to be trusted.... and in this case someone merged another person with the same name etc... --> Wikidata get a small chaos good is that we have versions and can do a rollback
- maybe look into setting up your own Wikidata see Wikibase and Wikibase.cloud is the way forward for researchers and just have "same as" Wikidata
best would be if you had your own unique identifier that we could have in Wikidata and could reference
- also have an API so we could easy check Wikidata quality with your data as we do with Nobelprize.org, Svenskt Kvinnobiografiskt lexikon, Swedish PM.... other examples is parliamentary data that user @tmtmtmtm has > 100 repositories for synch data see https://github.com/tmtmtmtm?tab=repositories example Swedens daily check

/Magnus +46-735152802

OT: Good article about CIA World Factbook and the quality by a person Tony Bowden who tries to update Wikidata "The CIA lost track of who runs the UK, so I picked up the slack" - Tony Bowden about his efforts to build an open source dataset of world leaders inside of Wikidata - his Github @tmtmtmtm

rbbby commented 2 years ago

@salgo60 great stuff thanks! The list will be of great use.

To answer some of your questions:

At the moment we are working with data from 1920, but are planning to extend to 1867. So the most important years are in that priority order.
Identifiers and link to wikidata is being worked on. Have tested it on a few years (1920, 50, 70), will be more work done it in the coming days, link: https://github.com/welfare-state-analytics/riksdagen-corpus/tree/wikidata/corpus

Btw do you know how to query for the list below? Speakers of riksdagen has for example position held Q1850749. But similar positions are missing for the vice speakers. https://sv.wikipedia.org/wiki/Lista_%C3%B6ver_vice_talm%C3%A4n_i_Sveriges_riksdag

salgo60 commented 2 years ago

Dont hesitate to call me and we can share screen and speak what you want to do 0735152802

But similar positions are missing for the vice speakers.

Then I suggest we create one... do we have good sources of who had those positions?

rbbby commented 2 years ago

Will do and sounds good! Wikipedia seems to have references to sources for many of the individuals For example: https://sok.riksarkivet.se/Sbl/Presentation.aspx?id=7790 https://portrattarkiv.se/details/sj9PGLAlnmUAAAAAABfNvw

I found sources in statskalendern after a quick look too, but seems that its missing for some early years. We have an OCR:d version of the relevant pages here: https://github.com/welfare-state-analytics/riksdagen-ocr/tree/main/statscalender

salgo60 commented 2 years ago

Sounds interesting I need to learn more about what you have

I saw that file tatorter.csv has tätortskod --> easy do same as Wikidata

In Wikidata that is Property:P775 Abbekås --> Tätorts-kod T3300 --> haswbstatement:P775=T3300 --> Wikidata Q2199524 ---> sv:WIkipedia Abbekås

or use the hub tool --> /P775:T3300?lang=sv

Property:P625 is Wikidata property for coordinate -->

/P775:T3300?property=P625 --> redirect Open Street Map

salgo60 commented 2 years ago

Wikipedia seems to have references to sources for many of the individuals For example: https://sok.riksarkivet.se/Sbl/Presentation.aspx?id=7790 https://portrattarkiv.se/details/sj9PGLAlnmUAAAAAABfNvw

I have connected all Riksarkivet SBL WD Property P3217 (I hope everyone see Notebook were I do webscraping) - blogpost
porträttarkiv and source "Tvåkammar-riksdagen" started last month and will take some more weeks see list with 910 MPs

rbbby commented 2 years ago

We essentially OCR:d the pages of statskalendern where information of riksdagen was present. Its about 10 pages each year. Searchable pdfs are available from other sources but can be a bit difficult to work with programmatically.

Very cool with tätorter! Not sure if we use the file for anything atm but such connections will likely be very interesting for some researchers in the future.

salgo60 commented 2 years ago

Speakers of riksdagen has for example position held Q1850749. But similar positions are missing for the vice speakers. https://sv.wikipedia.org/wiki/Lista_%C3%B6ver_vice_talm%C3%A4n_i_Sveriges_riksdag

I created this page Speaker of Swedish PM but as I said call me +46-735152802 so we understand what you want to do. I leave Stockholm on sunday and will be away and have less good internet...

video how it looks in Wikidata with templates etc...

salgo60 commented 2 years ago

Speakers of riksdagen has for example position held Q1850749. But similar positions are missing for the vice speakers.

@rbbby I did use Open Refine and did some reconcilation and uploaded vice speakers 1867–1920 to Wikidata see video (need som QA and sources)

Todo

[X] Tvåkammarriksdagens vice talmän (1867–1920)
- [X] Open Refine and do reconciliation and upload
- [X] add sources like SBL and Tvåkammar-riksdagen 1867-1970
- [X] quality assure
  - [X] SPARQL vice speaker of the First Chamber
  - [X] SPARQL vice speaker of the Second Chamber

[X] Tvåkammarriksdagens vice talmän (1921-1970)
- [X] add sources like SBL and Tvåkammar-riksdagen 1867-1970
- [X] quality assure

salgo60 commented 2 years ago

Identifiers and link to wikidata is being worked on. Have tested it on a few years (1920, 50, 70), will be more work done it in the coming days, link: https://github.com/welfare-state-analytics/riksdagen-corpus/tree/wikidata/corpus

file riksdagen-corpus/blob/wikidata/corpus/197879/prot-197879--114.xml

:rocket: :rocket: impressive work!!! let us know how we can help you.... @Ainali and some other people have done a lot of work in Wikidata related to the Swedish PM members/documents but this is a new very interesting level!!!!

Q6198452 in Wikidata

Wikidata:Lexicographical data

In Wikidata we also have a project for Lexicographical data --> we store a lexem like foliehatt = Lexeme:L54865 and have usage examples, who use the word "foliehatt" and what party think other parties has "foliehattar" ;-) it would be very interesting if we easy could reference a word usage in your corpus see eg. Lexeme:L54865#P5831 were I referenced data.riksdagen.se/dokument/H80939 but would be much more interesting to use your corpus, unique identifiers and point to a specific location in the corpus.... also start gathering when foliehatt was first used in the Swedish Parlament would be interesting.... @dpriskorn has written a tool dpriskorn/LexUtils to easily find usage examples maybe that tool could use your corpus?

Wikipedia advice

As Wikidata sometimes has more Swedish PM related information than sv:Wikipedia it can be good to activate a gadget "Lägg till Faktamall biografi WD i biografier" that adds a Template with Wikidata info see video

link preferencies/gadget "Lägg till Mall:Faktamall biografi WD i biografier om det inte redan finns någon infobox."

Before:

After:

Wikipedia advice 2

Add the following line your common.js

mw.loader.load("//www.wikidata.org/w/index.php?title=User:Yair rand/WikidataInfo.js&action=raw&ctype=text/javascript");

will display the Wikidata Qnumber at the top of the Wikipedia article on sv:Wikipedia see how I did it sv.wikipedia.org/wiki/Användare:Salgo60/common.js

Also WIkidata has this possibility to add in new tools see my WD common.js and more tools

Wikidata Status

A weekly report with new properties, status of development etc. is reported

Data Reuse Days announced March 14-24

tweet trying to get more CLARIN people interested....

In the last Wikidata Status a new event was announced. Data Reuse Days will take place on March 14-24, highlighing applications and tools using Wikidata's data. You can already propose a session.

Tool for tracking mismatches Wikidata:Mismatch Finder

FYI a tool is developed for handling mismatches between Wikidata and external sources. This tool will be open and can also be used by other communities

Wikidata:Mismatch_Finder ** GITHUB https://github.com/wmde/wikidata-mismatch-finder

dpriskorn commented 2 years ago

Wikidata:Lexicographical data

In Wikidata we also have a project for Lexicographical data --> we store a lexem like foliehatt = Lexeme:L54865 and have usage examples it would be very interesting if we easy could reference a word usage in your corpus see eg. Lexeme:L54865#P5831 were I referenced data.riksdagen.se/dokument/H80939 but would be much more interesting to use your corpus, unique identifiers and point to a specific location in the corpus.... also start gathering when foliehatt was first used in the Swedish Parlament would be interesting.... @dpriskorn has written a tool dpriskorn/LexUtils to easily find usage examples maybe that tool could use your corpus?

Thanks for reminding me about this. I agree, this would be a unique and interesting source of examples. I opened up a new issue to track that idea in LexUtils.

salgo60 commented 2 years ago

Btw do you know how to query for the list below? Speakers of riksdagen has for example position held Q1850749. But similar positions are missing for the vice speakers. https://sv.wikipedia.org/wiki/Lista_%C3%B6ver_vice_talm%C3%A4n_i_Sveriges_riksdag

@rbbby Now also vice speakers should be in Wikidata

[X] Q110785766 - SPARQL vice speaker of the First Chamber
[X] Q110785785 - SPARQL vice speaker of the Second Chamber
[X] Q110811796 - SPARQL first vice speaker of the First Chamber 1921-1970
[X] Q110811805 - SPARQL second vice speaker of the First Chamber 1921-1970
[X] Q110812751 - SPARQL first vice speaker of the Second Chamber 1921-1970
- SPARQL picture
[X] Q110812759 - SPARQL second vice speaker of the Second Chamber 1921-1970
- SPARQL picture
as a timeline - double click --> sv:Wikipedia
- test timeline with date precision - link
  - we need start and end date with day precision
- all speakers in Wikidata quality unsure as a Timeline Histropedia / SPARQL

vice speakers

all speakers

quality unsure - in a perfect world Wikidata had authorities we could check our data quality with and error report diffs see Wikidata:Mismatch_Finder. We have tried to start involve Riksarkivet SBL. Today we error report mismatches in a form but they lack API and version management see status overview Source:SBL I feel they lack IT knowledge to build better solutions?!?!? compare SKBL with API and structured data...

My understanding that we in the Riksdagstrycket have "talman" and you need to find who is the (vice) speaker... let me know if you find more odd things. Also if we could get your "list of name forms" with WIkidata Qnumber --> we could add them to WIkidata as alias....

work in progress is I Riksdagen kallad
Feature Request: as Wikidata is an open plattform and sometimes get vandalized or get odd changes (done with good intentions). It would be great if this information also is stored at your place..... and we could access and compare as we do with Nobelprize API (notebook), SKBL API (notebook)...

welfare-state-analytics / riksdagen-corpus