Closed trvrb closed 8 years ago
I see. So could use something similar to determine_lineage
to assign lineage based on similarity to the outgroup sequences?
Exactly. Can move the outgroup genbank files to source-data/
. I think this only needs to be done when the subtype / lineage isn't already specified in the GISAID fasta.
This looks to have been resolved in 639c40607131cde81d01e4566f57e33a31e069f2.
A large fraction of the GISAID submissions don't include full subtype information. This is especially common for
B/Vic
andB/Yam
. Because of this, asking for A/H3N2 in GISAID won't actually get all the H3N2 sequences. Take a look at what we (Richard) did in the nextflu build to account for this:https://github.com/blab/nextflu/blob/master/augur/src/make_all.py
This uses BioPython plus the outgroups for H3N2, H1N1pdm, Vic and Yam to make alignments and categorize sequences with ambiguous subtypes. @chacalle do you think you could borrow this code/logic for
Flu_vdb_upload.py
? With this in place, I could switch to using vdb rather than direct GISAID downloads for my nextflu builds.