nextstrain / status

Nextstrain status pages
https://nextstrain.github.io/status/
1 stars 0 forks source link

Exclude runs that are _not_ using the pathogen-repo-build workflow #10

Closed joverlee521 closed 3 days ago

joverlee521 commented 7 months ago

Originally noted in Slack when we first included the ncov-ingest workflows.

We currently filter workflows to those that use the pathogen-repo-build workflow https://github.com/nextstrain/status/blob/841b188463517ea1c5546098c714948d99518c00/pathogen-workflows.sql#L88

However, if the workflow had older runs that did not use the pathogen-repo-build, they are still included.

tsibley commented 7 months ago

To do this filtering correctly, we'd need at least one of:

  1. The history of a workflow's content (e.g. in Steampipe, github_workflow.workflow_file_content over time)
  2. The workflow content associated with a specific workflow run (e.g. in Steampipe, github_actions_repository_workflow_run.github_workflow_file_content)

I thought Steampipe might already provide 2, but unfortunately it doesn't (because the GitHub API doesn't).

It is possible to generate either of those by walking git history for each workflow (for 1) or fetching the workflow content for each run individually by commit id (for 2) but neither are very efficient or appealing when applied across the many repos × workflows × workflow runs.

Since this issue only affects us for ~30 days after transitioning from not using pathogen-repo-build to using it, I'm not sure it's worth spending any more time on. If it's burdensome to have the bogus runs for those 30 days, a simpler filtering method would be to hard code "run after X" timestamps for workflows where we make this transition.

genehack commented 3 days ago

Since this issue only affects us for ~30 days after transitioning from not using pathogen-repo-build to using it, I'm not sure it's worth spending any more time on. If it's burdensome to have the bogus runs for those 30 days, a simpler filtering method would be to hard code "run after X" timestamps for workflows where we make this transition.

Based on @tsibley's comment quoted above, I'm going to close this issue out as "not planned".

If folks thing that we should do this work, or want to use this issue as the basis of the "run after X" approach, please re-open.