simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Metric_files/VIRSorter_affi-contigs.tab "start-end-length" issues: #82

Closed ameisesophie closed 3 years ago

ameisesophie commented 3 years ago

Hello! Thanks for the nice code.

We are using VirSorter, and have got the "VIRSorter_affi-contigs.tab" file. but it seems that, the positoin annotation of some genes were wrong , for example: VIRSorter_scaffold_90144-circular-gene_3|943|24|404|-|-|-|-|-|-|-|- "943|24|404" seems not the "start|end|length", we didn't know how to explain it, and how to get the corresponded sequences.

Thanks!

simroux commented 3 years ago

Hi !

This is a bit of an "oddity" in VirSorter: when identifying circular contigs (i.e. 5'-3' direct terminal repeats), it will try to see if any gene span across the contig's "origin". This is because there are a number of small circular virus/phage genomes that only encode a few genes, and these can be truncated / split if the contig doesn't break right in the intergenic region.

That's what happens here, the contig is noted as "-circular", then the start of the gene is at 943, continues all the way to the 3' end of the (clean) contig, then jumps to the 5' start of the contig and stop in position 24. I mentioned "clean" contig because VirSorter will also output a fasta file with the repeats of these circular contigs removed (in the Fasta_files/ folder, see "_nett" file).

Let me know if it makes more sense, and if you have any further questions,

Best, Simon

ameisesophie commented 3 years ago

Dear Simon, Thanks! Your explaination helps me a lot!

Best, sophie

ameisesophie commented 3 years ago

Dear Simon, I also used VirSorter2, but have got no predicated ORFs results. So should i use Virsorter1, when i need both viral genomes info and ORFs info?

Best, Sophie

Hi !

This is a bit of an "oddity" in VirSorter: when identifying circular contigs (i.e. 5'-3' direct terminal repeats), it will try to see if any gene span across the contig's "origin". This is because there are a number of small circular virus/phage genomes that only encode a few genes, and these can be truncated / split if the contig doesn't break right in the intergenic region.

That's what happens here, the contig is noted as "-circular", then the start of the gene is at 943, continues all the way to the 3' end of the (clean) contig, then jumps to the 5' start of the contig and stop in position 24. I mentioned "clean" contig because VirSorter will also output a fasta file with the repeats of these circular contigs removed (in the Fasta_files/ folder, see "_nett" file).

Let me know if it makes more sense, and if you have any further questions,

Best, Simon

simroux commented 3 years ago

Hi ! When you say "no predicted ORF results", do you mean no ORFs were predicted, or you don't find the file that has this information the same way as in VirSorter1 ? If the former, it may be an issue with VirSorter2. If the latter, there is an option now in VirSorter2 called "--prep-for-dramv" that generates output similar to the one from VirSorter1. In both cases though, any question will likely be better addressed on VirSorter2 github :-) https://github.com/jiarong/VirSorter2

Thanks ! Best, Simon

ameisesophie commented 3 years ago

Hi!

Thanks for your reply!

Best, Sophie

Hi ! When you say "no predicted ORF results", do you mean no ORFs were predicted, or you don't find the file that has this information the same way as in VirSorter1 ? If the former, it may be an issue with VirSorter2. If the latter, there is an option now in VirSorter2 called "--prep-for-dramv" that generates output similar to the one from VirSorter1. In both cases though, any question will likely be better addressed on VirSorter2 github :-) https://github.com/jiarong/VirSorter2

Thanks ! Best, Simon