mysociety / parlparse

The scraper/parser that produces data for TheyWorkForYou, PublicWhip, etc
Other
61 stars 22 forks source link

Fix/revise update flow #191

Open ajparsons opened 1 month ago

ajparsons commented 1 month ago

Currently the data update system is a bit tangled.

I'd like to get it to a place where:

The source of truth is because I'm basing other flows (politican-data repo and twfy-votes database) off the github people.json. This could be switched to the one hosted by twfy, but in principle they should be the same.

Here's where I think we are at the moment:

---
title: Politician data flow (current)
---
flowchart TD

github["GitHub Parlparse"]
mysoc["git.mysociety Parlparse"]
twfy["twfy-live Parlparse"]
automatic(("Auto updates"))

github -->|Github Action| mysoc
mysoc -.->|Broken mirror| github
twfy -.->|No auto push| mysoc
mysoc --> twfy

automatic --> twfy

linkStyle 1,2 stroke:red,color:red;

As as result - you get updates pooling in the last stage that don't make their way back to GitHub.

This can be declogged by pushing back to github, as the mirroring action sorts out the rest. So should try and do this regularly for the moment.

My instinct (just because I'm generally using GitHub Actions to handle dataset updates) would be to move the automatic updates (as they now rely on external APIs rather than the transcripts) to GitHub Actions rather than needing a flow to and from the server (this doesn't need to be consistent but new update scripts might start there).

Just automating the pushing would fix the main current problem - but a github based flow would mean we could add some automatic testing - and prevent errors creeping back into the people.json.

So something like:

---
title: Politician data flow (GitHub centric)
---
flowchart TD

github["GitHub Parlparse"]
mysoc["git.mysociety Parlparse"]
twfy["twfy-live Parlparse"]
automatic(("Auto updates"))
manual(("Manual updates"))
validator{"Validation test action"}

manual --> PR --> validator --> github
automatic -->|"GithHub Action"| validator

github -->|GitHub Action| mysoc
mysoc -->|Auto pull at start of updates| twfy