Open gbp opened 5 years ago
Detection could be on the 3 QIDs values we store for the reconciled statement, possible done via a SQL GROUP BY clause?
This might not be possible due to Wikidata item getting merged EG. https://www.wikidata.org/wiki/User:Verification_pages_bot/verification/bz/Member_of_the_9th_House_of_Representatives#s:md5:07cec5a1220debee29d2572d2e2ec8f3 has a different person item than https://www.wikidata.org/wiki/User:Verification_pages_bot/verification/bz/Member_of_the_9th_House_of_Representatives#s:md5:0c7dfaedc9d624659b15d2d892c6c20e
Every Politician CSVs include the Wikidata QIDs so when a person is reconciled the MD5 hash calculated for the statement transaction ID will change resulting in a new verification pages statements when the CSV source is next fetched and loaded.
As has happened on: https://www.wikidata.org/wiki/User:Verification_pages_bot/verification/bz/Member_of_the_9th_House_of_Representatives
We should either:
prevent new statements by keeping the transaction IDs the same by changing generation method to ignore certain columns but it's also possible for other columns to change after reconciliation such as gender/ twitter/ facebook/ image/ etc... if the Wikidata item have these properties so this isn't really feasible
allow new statements but somehow detect and merge them when classifying them so only one (the latest) is returned in the statements JSON to the frontend. Detection could be on the 3 QIDs values we store for the reconciled statement, possible done via a SQL GROUP BY clause?