nextstrain / seasonal-flu

Scripts. config, and snakefiles for seasonal-flu nextstrain builds
44 stars 26 forks source link

Prioritize rather than filter to "complete" genomes #24

Closed trvrb closed 5 years ago

trvrb commented 5 years ago

Currently, besides reference genomes, select_strains.py is only passing through viruses that possess both HA and NA segments (due to our use of --segments ha na in the snakefile). For time pivots back about 3-6 months this is okay and there are usually enough strains with both HA and NA to fill sampling bins. However, some strains are just only getting HA sequenced. This was especially obvious looking just now where there are a number of H3s from January with just HA. These tend to be uploaded by groups that are not CCs.

I would propose to modify select_strains.py so that "complete" genome (as in possessing all entries in --segments) becomes another factor in priority rather than a hard constraint.

rneher commented 5 years ago

I would add a --all-segments flag to force the hard filtering, otherwise prioritize.