Open BobBorges opened 4 months ago
Full results of the first run:
FAIL, all test IDs not found in data
['X001', 'X002', 'X007']
This was expected -- these party names didn't have wiki IDs.
FAIL, some IDs out of range
803 out of correct time range 0.05395417590539542 ~~13610169491525424~~
party_id
Q110857 409
Q110837 171
Q1594086 68
Q111033682 61
Q213654 49
Q6487621 17
Q10554125 5
Q10444846 5
Q110843 4
Q10411412 3
Q10502466 3
Q10501500 1
Q10604308 1
Q10501501 1
Q10499105 1
Q7251368 1
Q4887122 1
Q3480145 1
Q110472693 1
Name: count, dtype: int64
The problem with time ranges is concentrated to relatively few problem party IDs. The full dataframe is attached for your perusal. party-names_oor.csv
FAIL, some data IDs not in test set
391 are found in the data but not our list of parties 0.026271585029899885 ~~6.627118644067797~~
party_id
Q53764745 273
Q111104528 27
Q111108382 22
Q327591 11
Q111478524 11
Q965481 8
Q50383811 4
Q108546388 4
Q61791721 4
Q10686221 3
Q10541441 2
Q1787940 1
Q7140617 1
Q7333461 1
Q10499215 1
Q4570298 1
Q10585380 1
Q26662709 1
Q220945 1
Q3360009 1
Q10549149 1
Q179111 1
Q122599272 1
Q118289007 1
Q111449676 1
Q4650881 1
Q114167741 1
Q388981 1
Q111382125 1
Q4574567 1
Q111283538 1
Q111476658 1
Q1208859 1
Name: count, dtype: int64
done
F
Similarly, unique party IDs found in the data, but not in the test set are relatively few. Full DF attached. party-names_not-found-test.csv
The biggest issue set (34% of problem cases) here involves Folkpartiet / Liberalerna.
Proposed Solution
1a. Propose SWERIK party ID property to wikidata. Our own IDs
1b. Set IDs properly in test file and Wiki Data
Generate and upload SWERIK ID to wiki Party pages
Correct temporal errors in the data, e.g. Q110857 before 25-11-2015 changed to Folkpartiet or Folkpartiet liberalerna.
Thanks Bob! That sounds reasonable to me, but I would like to hear what @MansMeg and @ninpnin think of this too.
I agree. We should have better IDs.
Let's start with 1a and 1b so we are "done" locally and it works.
Also, will we need to fix the data locally in our corpus for the tests to pass?
Also, 2 would open up a discussion on wikidata, right? I think it would be great to discuss this with the wikidata people.
add tests on the historical party names. (1) all IDs in the test file are found in the data (2) all party affiliations are within the time range when the party existed (3) all party IDs in party_affiliation.csv are in the test file