Closed chacalle closed 8 years ago
Very cool idea. Thoughts on making this "safe" so that unwanted sequences don't end up in the final database...
Could upload to test
database rather than vdb
. Or better yet, we could think about making a vdb_staging
database that would mirror tables within the vdb
database and there could be a script to copy vdb_staging
into vdb
.
This is a great direction. I'm going to shelve for the moment. Now that the data is basically working, we could start to think about smarter upload scripts. This should integrate well with the overall nextstrain pipeline.
If the VIPR had an API we could automate the search and download we're doing for ZIKA and other viruses in the future.
Yes. This would be awesome.
Create function to regularly search through entrez for new sequences to upload. I think this would be useful if there are multiple nextstrain websites being maintained, could automate retrieving new sequences from genbank.
Can query with entrez like
"Zika virus"[porgn] AND ("2015/01/01"[MDAT] : "2016-04-14"[MDAT]) AND ("10000"[SLEN] : "100000000"[SLEN])
. Possibly also only include sequences that includecomplete genome
in their description. Using entrez seems to lag slightly behind manually searching genbank (missing new sequences KX051563, KX056898 at the moment).Will want some sort of staging area that shows important sequence information where someone could approve sequences for uploading. Possibly email new sequence information and accession numbers to user for approval?