jessieren / VirFinder

VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data
Other
130 stars 24 forks source link

Original training data #2

Closed ATCGCGCTC closed 6 years ago

ATCGCGCTC commented 7 years ago

Hey Jie and Nathan, Maybe I entirely missed it, in which case I am very sorry to bother you, but could you make the exact data subset from "RefSeq virus and prokaryotic genomes sequenced from before and after 1 January 2014 " that were used to train and test the model publicly available? I'd love to try to match your results! Thanks in advance.

jessieren commented 7 years ago

Hi there,

Thank you very much for the interest in VirFinder.

The RefSeqs used for training and testing are accessible in the additional_file_Table2.xlsx under the directory "supplementary_data". The first column is for the accession numbers, using which we downloaded the corresponding genomes from NCBI. The discovered dates of RefSeqs can be found in the 3rd column. The RefSeqs were split into non-overlapping fragments and then used for training and testing. Hope that helps!

Best wishes, Jessie