simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

VirSorter predicts prophage that is longer than the actual contig size #68

Closed hoelzer closed 4 years ago

hoelzer commented 4 years ago

Hi! I just observed a possible issue where a predicted prophage sequence's stop is larger than the actual contig size:

Test assembly:

kleiner_2015.fasta.gz

Command used:

wrapper_phage_contigs_sorter_iPlant.pl -f ${fasta} --db 2 --wdir virsorter --ncpu ${task.cpus} --data-dir ${database} --virome

Observation

It seems that VirSorter predicts a prophage in a range that is actually larger than the contig size. Example:

>NODE_51_length_63443_cov_50.479870

So contig NODE_51 has a length of 63443 nt.

Now VirSorter predicts a prophage for this contig from position 19922-63493:

(base) [mhoelzer@hh-yoda-11-01 ~]$ grep NODE_51 virsorter/Predicted_viral_sequences/VIRSorter_prophages_cat-4.fasta 
>VIRSorter_NODE_51_gene_20_gene_72-19922-63493-cat_4

So the predicted prophage's stop position is larger than the actual contig size when I understand the output correctly?

simroux commented 4 years ago

Hi ! You're right, something's wrong here.. :-) It is however relatively innocuous and an easy fix: by default, VirSorter extends the prophage sequence by 50 nucleotide beyond the last gene in 5' and 3' (to include potential att sites and not end a contig right on a start / stop codon). I just forgot to include a check to make sure we don't extend past the contig, i.e. if the last gene of the prophage is at the end of the contig, the coordinate from VirSorter will be 50 nucleotides beyond the contig (hence 63493 vs 63443).

I'll fix it asap, but in the meantime you can safely use these results anyway, as the prediction is overall correct (also, the genbank file you get from this prophage is accurate too, since it automatically adjust to the actual sequence length).

Best, Simon

hoelzer commented 4 years ago

I see! Thanks for the explanation.

simroux commented 4 years ago

Should be fixed now, thanks again for reporting the bug !