nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
36 stars 20 forks source link

Fix upload diff race #424

Closed joverlee521 closed 8 months ago

joverlee521 commented 8 months ago

Description of proposed changes

Include the notification's touch file (data/{databas}/notify.done) as an input to the upload_single rule to ensure that uploads only run after the notifications rules have run.

This prevents the race condition between diffs and upload of files. This will slightly delay the uploads of files but is necessary to support the diffs of local files and files on S3.

Resolves #423

Checklist

joverlee521 commented 8 months ago

Comparing the current DAG and the new DAG with changes from this PR: now notify_gisaid must run before any of the upload_single rules.

joverlee521 commented 8 months ago

I plan to merge on Monday, 2024-01-29 when there's more likely to be new data and I can monitor ingest runs.