nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
36 stars 20 forks source link

trigger_rebuild_pipeline: Update inputs to match ncov builds #442

Closed joverlee521 closed 7 months ago

joverlee521 commented 7 months ago

The inputs required to trigger downstream ncov builds were originally set in Nov 2022 (80e53848c73a35de3b90be30bcd8144b29c92129).

The downstream ncov builds have since been updated to use different input files from S3 in April 2023:

  1. use aligned.fasta instead of sequences.fasta
  2. use zst instead of gz/xz files
  3. 21L inputs were updated to aligned.fasta and zst files

This commit updates the inputs for the trigger rule to ensure that the downstream ncov builds are only triggered after the appropriate files have been uploaded to S3.

[1] https://github.com/nextstrain/ncov/commit/1376d8238fe48dd47fe6dbbf5f9835dc3789baff [2] https://github.com/nextstrain/ncov/commit/ad0d1e3ac630dcde308a89fc5a04cda669058466 [3] https://github.com/nextstrain/ncov/commit/2323afde0533f18a09629e759948ff2e67856eb0

Checklist

joverlee521 commented 7 months ago

Merging without review since the local dry runs are able to create DAG with updated inputs

$ nextstrain build . --configfile config/gisaid.yaml --config trigger_rebuild=True -n
...
Job 79: Triggering nextstrain/ncov rebuild action (via repository dispatch)
Reason: Missing output files: data/genbank/trigger-rebuild.done; Input files updated by another job: data/genbank/metadata.tsv.zst.upload, data/genbank/aligned.fasta.zst.upload