mysociety / theyworkforyou

Keeping tabs on the UK's parliaments and assemblies
http://www.theyworkforyou.com/
Other
224 stars 50 forks source link

Switch TheyWorkForYou’s internal calculation of comparisons to twfy-votes #1742

Closed ajparsons closed 4 months ago

ajparsons commented 7 months ago

The XML files from twfy-votes also contain additional fields around comparison with parties.

This can in principle replace the ‘PartyCohort’ generation approach in TheyWorkForYou with a faster and more maintainable approach, and remove complexity from the TheyWorkForYou codebase.

Implementing this requires changing the current sourcing of party comparison scores from TheyWorkForYou’s database, to using entries stashed in the key lookup from the XML import.

This is not quite a prerequisite (but would help a lot) with the change to the public whip comparison algorithm.

Options include:

  1. Don’t do this. Make any adjustments to the comparison algorithm in TheyWorkForYou at same time and maintain both approaches.
  2. Change the data source, but leave the PartyCohort approach in the code base.
  3. Change the data source, remove duplicate approaches from the code base.

I think there’s a good case for at least 2, and would leave 2 vs 3 up to people with a better sense of the pros and cons.

--

A reason to keep the current approach intact would be as a third cross-check against the two approaches used in TWFY-votes. But this is practically a bit awkward - and wouldn’t be subject to the same automated cross-checking. There is a broader question about improving the twfy-votes test suite based on the TheyWorkForYou examples. TheyWorkForYou has scenario tests on fake data - whereas twfy-votes tests by applying two different approaches (following the same principles) on live data and checking for alignment.

dracos commented 7 months ago

Not sure who has the "better sense of the pros and cons" - I don't see any advantage to having the same calculation in multiple places from a maintenance point of view. (I do think fake data tests are useful in general.)

ajparsons commented 7 months ago

Yeah, given they're already defined in TWFY just need a bit of thought on moving those across.

ajparsons commented 7 months ago

So above I suggest changing everything so it stores information in the same place as the per person score, but an easier way might be to adapt the partypolicy table and store it there.

This table already is being reused by the cohort approach - where the 'party' column, now refers to a hash of a set of comparable MPs.

We can adjust this key to be person_id-party-chamber - e.g. 10001-labour-commons, and then use this to store the comparisons.

So the two steps would be:

  1. Adjust member>cohortKey to create this new key (it should use 'cohortParty' for the party bit)
  2. Add to the new json ingest something that populates this table based on the party comparison scores coming in.

The advantage of this is a) it's not a lot of finding all the references and b) twfy-votes is sometimes calculating multiple party comparisons for party switchers. This isn't being fed down the TWFY feed, but could be. The TWFY database would have all the comparison options, and I could lose the logic in twfy-votes replicating the 'which party is primary comparison', and it would all be done in TWFY.

dracos commented 6 months ago

Sorry, I don't follow your latest comment. I think my original comment (which was a bit terse, sorry), was saying go with your option 3, and remove stuff from TWFY if it's being calculated elsewhere. But I think you then/now think the opposite, and say remove it from twfy-votes and calculate it only in TWFY? If you think that works best and has the advantages, then I'm fine with that too, just not clear from this on the implementation.

ajparsons commented 6 months ago

No sorry I'm being confusing.

What I'm saying in the last comment is:

This is least steps from where we currently are - but is less thorough than storing the party comparisons in the same place as the actual scores.

Basically, calculations out of TWFY, there are a few options for where to store them when they come back - leave it with you on what's the best long term approach.

dracos commented 6 months ago

partypolicy table has a house column so presumably that doesn't need to be in the "party" key. If you're going that way, I'd add a person_id column to party_policy and then 'party' could actually be just the party?

I /think/ with the changes in #1752, if I've understood this right, this is looking at the comparison_party, comparison_distance_from_policy fields in the alignments and adding them to the right place in TWFY (given the above), removing all the cohort stuff that would otherwise be generating that data? And changing the cohort code to then use this new data storage, and it could assume that the per-person figure it gets for a policy is the 'right' one? Have I understood that?

ajparsons commented 6 months ago

That sounds right.

Adding some fields to the table would make it cleaner that's what's happening - yep. I think that means very slightly more work with the cohortKey because it won't be doing a single string lookup anymore - but not a big amount of work - and the clarity is probably worth it.

ajparsons commented 4 months ago

The associated PR is good to merge, I'm happy with the tests on this function, will do this early next week.