Currently the data update system is a bit tangled.
I'd like to get it to a place where:
There is a clear source of truth for most up to date data.
Reduce technical complexity of party update etc.
More integrity checks on manual and automatic updates.
The source of truth is because I'm basing other flows (politican-data repo and twfy-votes database) off the github people.json. This could be switched to the one hosted by twfy, but in principle they should be the same.
As as result - you get updates pooling in the last stage that don't make their way back to GitHub.
This can be declogged by pushing back to github, as the mirroring action sorts out the rest. So should try and do this regularly for the moment.
My instinct (just because I'm generally using GitHub Actions to handle dataset updates) would be to move the automatic updates (as they now rely on external APIs rather than the transcripts) to GitHub Actions rather than needing a flow to and from the server (this doesn't need to be consistent but new update scripts might start there).
Just automating the pushing would fix the main current problem - but a github based flow would mean we could add some automatic testing - and prevent errors creeping back into the people.json.
Currently the data update system is a bit tangled.
I'd like to get it to a place where:
The source of truth is because I'm basing other flows (politican-data repo and twfy-votes database) off the github people.json. This could be switched to the one hosted by twfy, but in principle they should be the same.
Here's where I think we are at the moment:
As as result - you get updates pooling in the last stage that don't make their way back to GitHub.
This can be declogged by pushing back to github, as the mirroring action sorts out the rest. So should try and do this regularly for the moment.
My instinct (just because I'm generally using GitHub Actions to handle dataset updates) would be to move the automatic updates (as they now rely on external APIs rather than the transcripts) to GitHub Actions rather than needing a flow to and from the server (this doesn't need to be consistent but new update scripts might start there).
Just automating the pushing would fix the main current problem - but a github based flow would mean we could add some automatic testing - and prevent errors creeping back into the people.json.
So something like: