nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.
MIT License
36 stars 20 forks source link

Ignore Nextclade cache on new Nextclade version or dataset version #457

Closed joverlee521 closed 3 months ago

joverlee521 commented 4 months ago

We've had a history of caching issues (https://github.com/nextstrain/ncov-ingest/issues/456, https://github.com/nextstrain/ncov-ingest/issues/392) come up whenever there's a new release of Nextclade or SARS-CoV-2 Nextclade dataset and we forget to manually upload the *.renew files to trigger the full re-run.

The workflow should automatically ignore the Nextclade cache if it encounters a new Nextclade version or dataset version.

joverlee521 commented 4 months ago

Also prompted by discussion in blab/forecasting project to surface Nextclade/dataset version in the sequence counts generated by forecasts-ncov/ingest

corneliusroemer commented 4 months ago

Good idea, yes.

Shouldn't be hard to set renew files programmatically based on equality or not of the columns in the existing files with the current software and dataset version!