nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
36 stars 20 forks source link

Add Snakemake retries to trigger jobs #462

Closed joverlee521 closed 4 months ago

joverlee521 commented 4 months ago

Context

Prompted by the workflow errors from the automated runs on 2024-07-11 (GISAID, GenBank).

Both failed during trigger of downstream forecasts-ncov GH Action workflows:

[batch] [2024-07-12T02:30:27+00:00] rule trigger_counts_pipeline:
[batch] [2024-07-12T02:30:27+00:00]     input: data/genbank/upload.done
[batch] [2024-07-12T02:30:27+00:00]     output: data/genbank/trigger-counts.done
[batch] [2024-07-12T02:30:27+00:00]     jobid: 96
[batch] [2024-07-12T02:30:27+00:00]     benchmark: benchmarks/trigger_counts_pipeline_genbank.txt
[batch] [2024-07-12T02:30:27+00:00]     reason: Missing output files: data/genbank/trigger-counts.done; Input files updated by another job: data/genbank/upload.done
[batch] [2024-07-12T02:30:27+00:00]     resources: tmpdir=/tmp
[batch] [2024-07-12T02:30:27+00:00]         ./vendored/trigger nextstrain/forecasts-ncov genbank/clade-counts
[batch] [2024-07-12T02:30:27+00:00]         
[batch] [2024-07-12T02:30:27+00:00] curl: (22) The requested URL returned error: 404 
[batch] [2024-07-12T02:30:27+00:00] Request failed

The workflows were able to trigger the downstream ncov builds just fine, so I don't think there was anything wrong with the GitHub token permissions. The last update to the forecasts-ncov GH Action workflows was a month ago, so I suspect this was a transient error.

We should add Snakemake retries to the trigger rules so that the workflow can automatically retry in case of transient errors like this.