Closed chacalle closed 8 years ago
There's not a good way to just download new viruses from GISAID. I usually download 20,000 viruses at a time (roughly the last two years), which is the maximum that is allowed by GISAID. So, yes, it would be better to first look in the database to see if a strain exists before trying to determine its lineage.
This has been resolved by vdb/flu_update.py -db vdb -v flu --update_groupings
.
When incorporating new sequences from GISAID into nextflu, are only relatively new sequences downloaded or is everything in GISAID downloaded?
vdb_parse
currently parses the fasta before trying to upload each sequence and checking if the virus is already in vdb. If all sequences from GISAID are going to be in the fasta each time, it will take a while to determine the lineage for all sequences. In this casevdb_parse
should immediately check for the virus in vdb after getting the strain name. If only relatively new GISAID sequences are in the fasta then this isn't a problem.