nextstrain / zika

Nextstrain build for Zika virus
https://nextstrain.org/zika
8 stars 10 forks source link

fix: index by genbank instead of duplicate strain #25

Closed j23414 closed 1 year ago

j23414 commented 1 year ago

Description of proposed changes

Since some strains (or isolates) were associated with multiple GenBank entries (maybe due to resequencing) resulting in duplicate strain names in the zika dataset, the original zika build was erroring out.

Screen Shot 2023-02-17 at 8 55 46 AM

This fixes the Snakefile by indexing records on GenBank ID instead and swap in the strain name at the end, following the same protocol used in the Monkeypox build (wrangle_metadata and final_strain_name rules).

Other changes include updating the list of dropped strain names to GenBank IDs and updating the example sequence fasta headers.

Related issue(s)

Testing