nextstrain / status

Nextstrain status pages
https://nextstrain.github.io/status/
1 stars 0 forks source link

Exclude runs that are _not_ using the pathogen-repo-build workflow #10

Open joverlee521 opened 2 months ago

joverlee521 commented 2 months ago

Originally noted in Slack when we first included the ncov-ingest workflows.

We currently filter workflows to those that use the pathogen-repo-build workflow https://github.com/nextstrain/status/blob/841b188463517ea1c5546098c714948d99518c00/pathogen-workflows.sql#L88

However, if the workflow had older runs that did not use the pathogen-repo-build, they are still included.

tsibley commented 2 months ago

To do this filtering correctly, we'd need at least one of:

  1. The history of a workflow's content (e.g. in Steampipe, github_workflow.workflow_file_content over time)
  2. The workflow content associated with a specific workflow run (e.g. in Steampipe, github_actions_repository_workflow_run.github_workflow_file_content)

I thought Steampipe might already provide 2, but unfortunately it doesn't (because the GitHub API doesn't).

It is possible to generate either of those by walking git history for each workflow (for 1) or fetching the workflow content for each run individually by commit id (for 2) but neither are very efficient or appealing when applied across the many repos × workflows × workflow runs.

Since this issue only affects us for ~30 days after transitioning from not using pathogen-repo-build to using it, I'm not sure it's worth spending any more time on. If it's burdensome to have the bogus runs for those 30 days, a simpler filtering method would be to hard code "run after X" timestamps for workflows where we make this transition.