Scoring with high biological accuracy and plausibility
Virologist deep dives on virus-likes (e.g., influenza-like viruses in reptiles and fish)
Scoring at higher taxonomic levels (e.g. genus Betacoronavirus)
Higher host and virus taxonomy for all NCBI data
Remove all bacteriophages from the dataset
Update CLOVER to include non-mammalian viruses
Update CLOVER and GenBank to successfully extract year from nucleotide entries
Update SRA to extract year
Separate collection and publication time stamp
Decide on a better harmonized detection criteria taxonomy for GenBank and SRA
Automate download of GenBank and SRA
Engage in EID2 as a dynamic problem
Automate actions across scripts
Engage with DBat/RodVir, VIPR, ViralZone, Virus-Host DB as data sources and develop Reconciliation Plan 2.0
Implement versioning and DOI
Develop use case examples
Make a kanban board for different major tasks: SRA architecture, automation, novel data streams, metadata enrichment, data cleaning and validation, text mining (Max)
Let's get to work!