Currently, besides reference genomes, select_strains.py is only passing through viruses that possess both HA and NA segments (due to our use of --segments ha na in the snakefile). For time pivots back about 3-6 months this is okay and there are usually enough strains with both HA and NA to fill sampling bins. However, some strains are just only getting HA sequenced. This was especially obvious looking just now where there are a number of H3s from January with just HA. These tend to be uploaded by groups that are not CCs.
I would propose to modify select_strains.py so that "complete" genome (as in possessing all entries in --segments) becomes another factor in priority rather than a hard constraint.
Currently, besides reference genomes,
select_strains.py
is only passing through viruses that possess both HA and NA segments (due to our use of--segments ha na
in the snakefile). For time pivots back about 3-6 months this is okay and there are usually enough strains with both HA and NA to fill sampling bins. However, some strains are just only getting HA sequenced. This was especially obvious looking just now where there are a number of H3s from January with just HA. These tend to be uploaded by groups that are not CCs.I would propose to modify
select_strains.py
so that "complete" genome (as in possessing all entries in--segments
) becomes another factor inpriority
rather than a hard constraint.