keithamoss / demsausage

Democracy Sausage
https://democracysausage.org
MIT License
10 stars 4 forks source link

Polling place data loading meta issue #1339

Open keithamoss opened 2 years ago

keithamoss commented 2 years ago

To ponder

  1. Make it possible to fix individual booths without doing a full polling place data reload
  2. Add an append mode for adding semi-official collections of booths? (e.g. Overseas/Interstate)
  3. Add a way to manually add a single booth?
  4. Think about how (and if) we need to handle pre-poll voting centres (Largely affects overseas booths, but with a rise in pre-polls we should be prepared to support showing pre-poll booths before election day and then hiding on election day itself. Discuss with the team. Would people use the site to decide where to pre-poll vote? The motivation and behaviour is different.)
  5. How could we have a way to change the location information via the UI outside of a polling place data ingest (lat/lon or address) that let us keep in sync with the polling place loader JSON and process.

Bugs

1. The "find by distance" approach is flawed (as demonstrated by the query below that returns both a public school and a nearby community centre. Consider a more robust approach.

SELECT "app_pollingplaces"."id", "app_pollingplaces"."old_id", "app_pollingplaces"."election_id", "app_pollingplaces"."noms_id", "app_pollingplaces"."geom"::bytea, "app_pollingplaces"."name", "app_pollingplaces"."facility_type_id", "app_pollingplaces"."premises", "app_pollingplaces"."address", "app_pollingplaces"."divisions", "app_pollingplaces"."state", "app_pollingplaces"."wheelchair_access", "app_pollingplaces"."entrance_desc", "app_pollingplaces"."opening_hours", "app_pollingplaces"."booth_info", "app_pollingplaces"."status", "app_pollingplaces"."chance_of_sausage", "app_pollingplaces"."extras", "app_pollingplaces"."ec_id", ST_Distance("app_pollingplaces"."geom", ST_GeogFromWKB('\x0101000020e610000062105839b4e46240c286a757caba40c0'::bytea)) AS "distance" FROM "demsausage"."app_pollingplaces" WHERE ("app_pollingplaces"."status" = 'Active' AND NOT ("app_pollingplaces"."election_id" = 27) AND ST_Distance("app_pollingplaces"."geom", ST_GeogFromWKB('\x0101000020e610000062105839b4e46240c286a757caba40c0'::bytea)) <= 200.0) ORDER BY "app_pollingplaces"."election_id" ASC;

2. Have the polling place loader, and DEV, stop sending things to Sentry

https://github.com/keithamoss/demsausage/issues/1731

Enhancements

1. Allow us to see a diff of current vs new data to make more informed decisions about going ahead with a reload or not

Like, is anything really changing? Can we back off the pace of updates a bit.

This could be a task split off from the current process that runs partway and produces a report?

A rough difference report between the data the existed and the data that loaded (e.g. Based on presence/absence of name + premises + state). Just something rough for us to gauge what's changed.

2. Better handle stall submissions, pending stall approvals, and booth editing that happen whilst a reload is occurring

Simple solution: If there's an error (i.e. hanging stalls), just re-run the stall migration logic a second time afterwards to pick up anything that came through. Does that work inside the transaction? I think so?

Background: A Federal election can take 8 - 10 minutes to completely refresh the polling place data.

A subset of that time is spent in prepping the polling booth data for ingest (non-blocking), but an unavoidable few minutes is spent in writing new booths and relinking stalls and noms to them.

For the 2022 Federal election we had a handful of cases where stalls being added, or booths being edited by admins, happened while a refresh was happening. This caused the the stalls to become unlinked from the new booth, or likewise for noms and the booth, and we had to (manually!) fix it in the database.

3. Extremely MVP polling deduplication helper

Make the errors/warnings about duplicate places have pre-populated Google Maps links so we can more easily resolve. (Address and Coords for both)

c.f. https://github.com/keithamoss/demsausage/issues/89 for the full solution

4. Geocoding and validation

See https://github.com/keithamoss/demsausage/issues/89

Add a geocoder step that tries to (1) geocode based on data from past elections, and failing that, (2) based on the Google/Mapbox geocoder. If the geocoder returns two reasonably confident results then display a GUI for the user to define which is the right location.

5. Think about how to support unofficial / not yet in the data polling places

e.g. Beaconsfield Primary School and Waitara Anglican Church were reported by users, but hadn't yet made it into the official AEC list. Advice was that they may still be subject to final negotiations, so hence they're not formally declared yet.

If we add them as unofficial polling places we need:

  1. A way to actually do that (ideally with a UI)
  2. To handle migrating them and their noms when we do a data refresh
  3. To handle them when we do a data refresh and the electoral commission has formally added them

6. Geocoding validation of EC polling places

7. General enhancements

Resources

Django

https://docs.djangoproject.com/en/2.2/topics/db/transactions/ https://docs.djangoproject.com/en/2.2/ref/databases/ (Performance and optimisation)

PostGIS

https://gis.stackexchange.com/questions/14232/using-a-geodjango-pointfield-with-geography-true-my-distance-calculations-are-w https://web.archive.org/web/20180204152904/http://workshops.boundlessgeo.com/postgis-intro/geography.html

AB#23

keithamoss commented 3 days ago

Think about how we detect changes in the file - maybe some generic Python to diff a file as part of the scarpers so we can make the call about whether it's worth doing a reload or if non-critical info has changed.

It'd be good to know for our info and context wot's changing.