Closed MansMeg closed 2 years ago
Saving notes from meeting for discussion on monday. A bunch of small file with their contents are listed, which then is joined into an observation level file.
[x] individual.csv wiki_id, born, dead, gender
[x] party.csv wiki_id, party, start, end
[x] name.csv wiki_id, name
[x] twitter.csv wiki_id, handle
[x] government.csv wiki_id, government, start, end
[x] minister.csv wiki_id, role, start, end
[x] talman.csv wiki_id, role, start, end
[x] member.csv wiki_id, role, start, end, district, party
I think the original issue already contains a lot of my thoughts and comments. Some additional comments/suggestions:
Re. @rbbby s comment above I have the following comments:
Otherwise, I think this solves many parts of the API.
Here are the new queries and csv files generated from wikidata. No cleaning is done at this stage other than renaming/dropping columns and formatting dates (applying assumptions and joining files is next in the pipeline). Please comment if you find something we should change with regards to the API discussion above.
One note is that time data in member.csv, minister.csv, speaker.csv and prime-minister.csv all are split up into their own queries and files. This is due to member.csv and minister.csv having the variables party and government respectively, and both variables being exclusive to their respective datasets. So the four files are kept separately to keep them sparse, and speaker.csv is not joined with prime-minister.csv as it would not be intuitive to join just the two of them and could cause "mental overhead".
https://github.com/welfare-state-analytics/riksdagen-corpus/tree/api/input/wikidata
We need to set up a more long term API for the corpus. With a data API, I mean the structure on how the data should be stored and that we can build upon in research. The focus on this API to clarify how the data is structured and simplify use of the corpus.
Currently, I see the "API" as:
Current problems with the API:
Here are some suggestions on how to improve the API and its documentation (short-term):
@ninpnin and @rbbby : Any thoughts on this?