mysociety / parlparse

The scraper/parser that produces data for TheyWorkForYou, PublicWhip, etc
Other
61 stars 22 forks source link

Script to add London Assembly Members #101

Closed jacksonj04 closed 5 years ago

jacksonj04 commented 5 years ago

Adds a script to add London Assembly Members.

This runs the following SPARQL query against the WDQS endpoint to retrieve memberships of the London Assembly. One entry will be returned for each membership.

SELECT ?item ?itemLabel ?electoraldistrict ?electoraldistrictLabel ?parliamentarygroup ?parliamentarygroupLabel ?starttime ?endtime ?twfy_id WHERE { 
    ?node ps:P39 wd:Q56573014 .
    ?item p:P39 ?node .
    ?node pq:P580 ?starttime .
    OPTIONAL { ?item wdt:P2171 ?twfy_id }
    OPTIONAL { ?node pq:P4100 ?parliamentarygroup }
    OPTIONAL { ?node pq:P768 ?electoraldistrict }
    OPTIONAL { ?node pq:P582 ?endtime }
    OPTIONAL { ?node pq:P2715 ?election }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Matching/Creating People

It then tries to match members to people.json, or create new ones if it can't. Specifically, it:

  1. Looks for a member with a wikidata identifier matching the query result.
    • If one is found but the twfy_id of the Wikidata object isn't set, it'll raise an info message to prompt improving Wikidata.
    • If one is found but the twfy_id of the Wikidata is object is set and doesn't match, it'll raise a warning for someone to investigate.
  2. If it can't match by Wikidata object ID, it'll try match by ParlParse ID (if one is present).
    • If this works then it'll add the Wikidata ID to the person in ParlParse.
  3. If it can't match on either explicit ID, it'll try match on name.
    1. If a name match is found, it'll prompt the person running to make the match explicit by adding an ID to Wikidata and then run with --create.
    2. If a name match isn't found, it'll tell the person running to explicitly run with --create to mint new IDs.
  4. Assuming the script is run with --create, it'll mint new ParlParse IDs for new people.
    • Names are split into surname and forenames with a naïve split on space. In future this could use explicit claims for names in Wikidata, but I don't think it's worth it.
    • New people will have Wikidata IDs set

Matching/Creating Memberships

  1. Look for a membership with the wikidata identifier matching the snak identifier in the result. 9105e031a2801b7900fe165632e708aaad2bf68b has changes which enable this.
    1. If one is found, use that.
    2. If not, create a new one. Updates have been made to _max_member_id() to support new memberships to london-assembly in the ID range 200,000+. See 7c7c8d27a89a7bc7850b8f03f19c03b7eead8027.
  2. Update the membership with details of dates, party etc. Groups in the GLA are mapped to existing parties in ParlParse, which are subtly different from those as encoded in Wikidata.

Constituency Members vs Additional Members

The 'Member of the London Assembly' role in this doesn't include a constituency; Members are elected using AM so although we could add a distinction between Constituency and Additional Members (this is sort of modelled in Wikidata) I don't think there's a benefit to doing so.

"What are all these warnings about?"

If you're reviewing this and run the script, you'll get lots of warnings telling you that ParlParse IDs aren't in Wikidata. This is because it wouldn't make any sense to add the IDs to Wikidata until this is merged, since the identifiers aren't actually reliable until they're committed.

struan commented 5 years ago

OOI, why not add the constituencies? It seems like useful information and we do it on WTT so it would make sense to be consistent between them.

jacksonj04 commented 5 years ago

The aim of this is to allow Mayor's Questions to be included, and the constituency a member is elected to isn't really relevant to that. It's not an inordinate amount of work to add since the data is already in Wikidata, but it does mean adding more Popolo tooling to match areas and posts through arbitrary ID schemes.

struan commented 5 years ago

I just think that come time for elections allowing people to look up "what my AM has been asking about" is kind of core TWFY functionality. Not sure what we said in the grant proposal for this though.