Open keithamoss opened 2 years ago
Think about how we detect changes in the file - maybe some generic Python to diff a file as part of the scarpers so we can make the call about whether it's worth doing a reload or if non-critical info has changed.
It'd be good to know for our info and context wot's changing.
To ponder
Bugs
1. The "find by distance" approach is flawed (as demonstrated by the query below that returns both a public school and a nearby community centre. Consider a more robust approach.
2. Have the polling place loader, and DEV, stop sending things to Sentry
https://github.com/keithamoss/demsausage/issues/1731
Enhancements
1. Allow us to see a diff of current vs new data to make more informed decisions about going ahead with a reload or not
Like, is anything really changing? Can we back off the pace of updates a bit.
This could be a task split off from the current process that runs partway and produces a report?
A rough difference report between the data the existed and the data that loaded (e.g. Based on presence/absence of name + premises + state). Just something rough for us to gauge what's changed.
2. Better handle stall submissions, pending stall approvals, and booth editing that happen whilst a reload is occurring
Simple solution: If there's an error (i.e. hanging stalls), just re-run the stall migration logic a second time afterwards to pick up anything that came through. Does that work inside the transaction? I think so?
Background: A Federal election can take 8 - 10 minutes to completely refresh the polling place data.
A subset of that time is spent in prepping the polling booth data for ingest (non-blocking), but an unavoidable few minutes is spent in writing new booths and relinking stalls and noms to them.
For the 2022 Federal election we had a handful of cases where stalls being added, or booths being edited by admins, happened while a refresh was happening. This caused the the stalls to become unlinked from the new booth, or likewise for noms and the booth, and we had to (manually!) fix it in the database.
3. Extremely MVP polling deduplication helper
Make the errors/warnings about duplicate places have pre-populated Google Maps links so we can more easily resolve. (Address and Coords for both)
c.f. https://github.com/keithamoss/demsausage/issues/89 for the full solution
4. Geocoding and validation
See https://github.com/keithamoss/demsausage/issues/89
Add a geocoder step that tries to (1) geocode based on data from past elections, and failing that, (2) based on the Google/Mapbox geocoder. If the geocoder returns two reasonably confident results then display a GUI for the user to define which is the right location.
5. Think about how to support unofficial / not yet in the data polling places
e.g. Beaconsfield Primary School and Waitara Anglican Church were reported by users, but hadn't yet made it into the official AEC list. Advice was that they may still be subject to final negotiations, so hence they're not formally declared yet.
If we add them as unofficial polling places we need:
6. Geocoding validation of EC polling places
distance_shift_km
andec_id
in the outputconfig.json
compatible JSON blob7. General enhancements
migrate_noms
Resources
Django
https://docs.djangoproject.com/en/2.2/topics/db/transactions/ https://docs.djangoproject.com/en/2.2/ref/databases/ (Performance and optimisation)
PostGIS
https://gis.stackexchange.com/questions/14232/using-a-geodjango-pointfield-with-geography-true-my-distance-calculations-are-w https://web.archive.org/web/20180204152904/http://workshops.boundlessgeo.com/postgis-intro/geography.html
AB#23