simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Different number of Viral contig #72

Open zhanwen-cheng opened 4 years ago

zhanwen-cheng commented 4 years ago

Hi Simroux: Firstly I really appreciated this pipeline you contributed to the viral study. In my case, there are two questions. I have downloaded 3745 viral Refseq contigs from NCBI and ran this data through the VIRSorter, and only 844(cat 1,2 and 4,5) contigs were predicted as viral contigs, it seemed some parameters were wrong set in my command, could you double check that? "nohup wrapper_phage_contigs_sorter_iPlant.pl -f ../raw_data/viral.3.1.genomic_new.fna --db 1 --wdir ./ --ncpu 20 --data-dir ~/data/data_base/virsorter-data &" Also, I added 3000 my own fasta contigs to the previous 3745 Refseq contigs and ran through VIRSortrt with the same command again, and a total of 970 contigs (912 from Refseq and 58 from my own contigs) were predicted from cat 1,2 and 4,5. I am little confused that why there is a difference in predicted number of contigs from viral Refseq? Thank you again!

simroux commented 4 years ago

Hi, When processing a dataset composed mostly (or in this case, exclusively) of viruses, you should use the "Virome decontamination" mode of VirSorter, i.e. option "--virome". This should make more sense then (including giving you similar results whether or not you add your contigs to viral RefSeq).

For more information about why, please see the VirSorter paper: https://peerj.com/articles/985/.

Best, Simon