Remove unused variables and refactor `GENE_LIST`

nextstrain / ncov-ingest

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.

MIT License

36 stars 20 forks source link

Noted in previous PRs that that the GENES and GENES_SPACE_DELIMITED variables are not needed¹ or used in the workflow,² so refactor the GENE_LIST to be a hardcoded list of genes.

If we want to ensure that we do not miss any genes from the Nextclade dataset, we could parse out the gene names from the dataset's genome_annotation.gff file. However, I think that will over-complicate the Snakemake workflow so I'm leaving the hardcoded list.

¹ https://github.com/nextstrain/ncov-ingest/pull/372#discussion_r1046020969 ² https://github.com/nextstrain/ncov-ingest/pull/435#discussion_r1496332575

Checklist

[ ] Checks pass

nextstrain build \ --envdir ~/Repos/env.d/aws/ \ --image nextstrain/ncov-ingest \ . \ --configfile config/debug_sample_genbank.yaml \ --config s3_dst=s3://nextstrain-data/files/ncov/open/branch/update-gene-list

nextstrain / ncov-ingest

Remove unused variables and refactor `GENE_LIST` #437

Checklist