mysociety / theyworkforyou

Keeping tabs on the UK's parliaments and assemblies
http://www.theyworkforyou.com/
Other
224 stars 50 forks source link

Switch JSON download from public whip to twfy-votes #1743

Closed ajparsons closed 2 months ago

ajparsons commented 7 months ago

Public Whip creates popolo JSON feeds of all votes associated with a policy, which TheyWorkForYou imports to create an association between a division, and update any division descriptions based on the content in the Public Whip.

TheyWorkForYou Votes recreates these popolo feeds (although part of this ticket is verifying they work as intended).

Currently, the process is done in several phases.

The simplest approach would be to adjust parlparse to pull from a new source (might need a new json feed from twfy-votes to give a list of available policies (twfy has a list of IDs, but parlparse doesn’t).

However, the XML export from PW to TWFY does not do this double loop. Do we want to similarly switch to TheyWorkForYou directly querying the JSON and cut it out from ParlParse?

(The current approach does mean the json is then publicly available).

dracos commented 7 months ago

List of policies is presumably linked with https://github.com/mysociety/theyworkforyou/issues/1747

If the data is going to be available elsewhere anyway, I don't see an issue with changing it. It looks like the fact it requests policy XML for the MP info and then vote info from the JSON is purely historical, I see no reason to maintain that if it can be simplified. I assume ideally you'd want to request one policy JSON URL that returned the MP information and the vote information together that could then be imported, does that make sense with the twfy-votes repo?

ajparsons commented 7 months ago

Yeah, that'd be fine.

So we'd get towards something like:

policy_details: {meta information about the policy}
votes: [information about how each person voted for each vote]
alignment: [overall information about alignment for each person with the policy]

The existing popolo-esque view can be adapted to add that.

ajparsons commented 7 months ago

This end point now contains extra json information.

Exact details in the swagger API documentation linked to from home page.

dracos commented 7 months ago

Using 810 as a comparison, the outputs are different at present, I'm afraid. For example, 2005-06-14 division 12 in PW JSON has aye/no/both/absent of 199/292/0/153, whereas votes has 233/312/0/105. The 'text' of the motion is also different, PW has "National Lottery Bill (Reasoned amendment on second reading)", votes only has "National Lottery Bill". Or 2008-04-28/155 has 223/311/0/11 vs 265/335/0/50 and "Finance Bill - Clause 21 - Amusement Machine licence duty" vs "Orders of the Day - Clause 21 - Amusement Machine licence duty".

For the alignments, checked the first dozen or so... 10003 PW has voted 3, absent 2; votes has voted 2, absent 2. 10002 is in PW but not in votes; I assumed this was because voted 0 times, but 10007 is present in both voting 0 times. 10011 - absent for 5 in PW, absent for 4 in votes.

ajparsons commented 7 months ago

Division totals are just going to be a number problem I need to look at.

Text I think I need to parse the wikitables to get the full labels stored in publicwhip (related to https://github.com/mysociety/twfy-votes/issues/5)

Alignment I would expect to be slightly off on 810 because it has a Lords vote that's ignored in the other system (those misalignemnts becomes Lords later). 1027 is an existing pure commons policy and should line up.

Gerry Adams I knew about - It calculates everything for one MP at once, Gerry Adams has never voted, so gets nothing scored, whereas others will appear because they have voted elsewhere. Is this a problem in terms of the ingest? Would be fairly easy to create null entries in these cases.

dracos commented 7 months ago

I don't think Gerry Adams matters, no, looks like the code checks *_distance exists, so wouldn't matter if it wasn't set, and the front end overrides any vote display for SF MPs. So should be fine.

Makes sense on the Lords vote not being there, yep. Titles, yep, the script is https://github.com/publicwhip/publicwhip/blob/ac756343534ebfa36edf8f4e1740e3c5407acb85/build/generate_popolo_json.php, calling get_wiki_current_value from pw_dyn_wiki_motion table and extracting title and maybe yes/no from it that way.

ajparsons commented 7 months ago

Big division problem fixed - absences are still off because I hardcoded 650 and didn't go back to fix it, but understand why that's happening.

dracos commented 6 months ago

Sorry, more problems, the IDs don't match, eg 6679 - PW has an ID of pw-2010-07-06-14-commons, but votes has pw-2010-07-06-commons. votes is lacking the policy_vote, not sure if that's easy to work out? I guess look at the counts and then work it out from majority/minority + "strong", as long as that doesn't have any edge issues. And for my own documenting, PW has "aye" in the counts, votes has "yes" (can cope with both easily enough there).