simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Predicted_viral_sequences gbk files #5

Closed ovpop100 closed 8 years ago

ovpop100 commented 8 years ago

Hi, I have a question regarding the output files. Why do the gbk files in Predicted_viral_sequences contain more genes than indicated by the fragment prediction in the global-phage-signal.csv? for example: NC_017387_gi_385235550_ref_NC_017387_1Acinetobacter_baumannii_TCDC-AB0715_chromosomecomplete_genome,3959,NC_017387_gi_385235550_ref_NC_017387_1Acinetobacter_baumannii_TCDC-AB0715_chromosomecompletegenome-gene1152-gene_1240,89,2,4,gene_1154-gene_1234:15.17897412683534,,gene_1152-gene_1240:39.85530235945591,,,

Fragment prediction is: gene_1152_gene_1240 but the last gene in the gbk file is here gene_2465.

What I'm missing? Thank you

simroux commented 8 years ago

Hi,

The earlier versions of VirSorter would generate genbank file include the whole contig on which a prophage was detected, in order to provide the complete genomic context of each prophage. This was changed in the latest version because we realized this could be confusing and counter-intuitive, so now, the gbk files generated should include only the prophage. Did you get this result with the most up-to-date VirSorter scripts ? and if not, could you try it on the same data and see if the gbk files are indeed "prophage-only" ?

Thanks

ovpop100 commented 8 years ago

Thank you very much for the quick reply. I cloned the version from github one week ago. I'm using the scripts without docker and run the prediction by starting the wrapper script. The input files are fasta files of full sequenced genomes. The NC_017387 file contains 3959 genes as annotated in NC_017387_mga_final.predict. If the fragment coordinates from the global-phage-signal.csv are correct, then I can extract the sequence of interest by myself. Thank you.