nextstrain / zika

Nextstrain build for Zika virus
https://nextstrain.org/zika
8 stars 10 forks source link

Automate ingest and phylogenetic workflows #52

Closed joverlee521 closed 6 months ago

joverlee521 commented 6 months ago

Description of proposed changes

Adds a single GH Action workflow to automate the ingest and phylogenetic workflows, set to run daily at the same time as the automated mpox ingest.

Uses GH Action caches to store hash of ingest results' Metadata.sha256sum values added to the S3 metadata within upload-to-s3. If the cache contains a match from previous runs of the GH Action workflow, then the workflow will skip the phylogenetic job.

See commits for details.

Related issue(s)

Based on discussion in https://github.com/nextstrain/pathogen-repo-guide/issues/25

Checklist

joverlee521 commented 6 months ago

This currently does not support

I figured these can be added in the future. Will need to think through whether it makes sense to support these within this workflow (which will complicate conditionals) or should we just have separate GH Action workflows for them.

joverlee521 commented 6 months ago

I merged after approval from team in today's walk-through of the workflow.

I manually ran the workflow on the main branch. The first workflow ran both ingest and phylogenetic despite the metadata/sequences files not changing. This is because the GitHub Action cache is segregated by branch.

The second workflow is able to check the cache from the first run on the main branch and only ran the ingest job.

I will check in tomorrow's scheduled run as well.

joverlee521 commented 6 months ago

Today's scheduled workflow ran as expected. There was no new data in ingest so the phylogenetic job was skipped.

The zika repo is also now on the pathogen workflow status page. As expected, "Ingest to phylogenetic" shows up as a single workflow with a single completion status.

If we find that we want to split the status of ingest and phylogenetic, we have a couple options:

  1. Update the status page to reflect job status within the workflow. I haven't found a table in steampipe-plugin-github that includes individual job status, but there is a GitHub API endpoint for individual jobs of a workflow run.
  2. We revert back to the previous set up of individual workflows for ingest and phylogenetic, where ingest triggers the phylogenetic workflow after completion.