Closed joverlee521 closed 9 months ago
Tested locally by running the debug config
nextstrain build \
--envdir ~/Repos/env.d/aws/ \
--image nextstrain/ncov-ingest \
. \
--configfile config/debug_sample_genbank.yaml \
--config s3_dst=s3://nextstrain-data/files/ncov/open/branch/update-gene-list
All translation_*.fasta.zst files have been uploaded to s3://nextstrain-data/files/ncov/open/branch/update-gene-list
Noted in previous PRs that that the
GENES
andGENES_SPACE_DELIMITED
variables are not needed¹ or used in the workflow,² so refactor theGENE_LIST
to be a hardcoded list of genes.If we want to ensure that we do not miss any genes from the Nextclade dataset, we could parse out the gene names from the dataset's genome_annotation.gff file. However, I think that will over-complicate the Snakemake workflow so I'm leaving the hardcoded list.
¹ https://github.com/nextstrain/ncov-ingest/pull/372#discussion_r1046020969 ² https://github.com/nextstrain/ncov-ingest/pull/435#discussion_r1496332575
Checklist