mtisza1 / Cenote-Taker2

Cenote-Taker2: Discover and Annotate Divergent Viral Contigs (Please use Cenote-Taker 3 instead)
MIT License
56 stars 7 forks source link

Rotation to avoid ORFs overlapping with breakpoint does not seem to work? #20

Closed ldj248 closed 1 year ago

ldj248 commented 2 years ago

Hi Thank you for developing this amazing tool!

I have some circular phage genomes for which an ORF still overlaps with the breakpoint of the sequence. If I understand correctly, the genome should be rotated if this is the case (cenotetaker2.1.3.sh: starting from lines 746) The genomes are rotated, but there are still overlapping ORFs...

I received no errors and ran CenoteTaker2 with the --enforce_start_codon False option (I want partial ORFs on linear contigs to be included - not sure if this has anything to do with it), --min_circular_hallmark_genes 0 & -am True. As far as I can tell, none of the files report the "missing" part of the ORF - not even as a second "partial" ORF at the end of the sequence (which would already be helpful).

I'm not sure what went wrong or if this just means that there is no better position to break open the circular sequence?

Thanks in advance for your help!

mtisza1 commented 2 years ago

Hi,

Thanks for using the tool, and thank you for opening an issue.

Does it seem like all/most of your circular genomes are wrapping within a gene? Or is this a relatively rare event? Also, is this primarily with Microviruses (I've seen a few that might have ORFs overlapping the entire length of the genome).

A possible cause might be different ORF callers. I use getorf to quickly call ORFs before wrapping, but phage genomes ultimately get ORFs called with PHANOTATE. getorf might just be missing some "tricky" ORFs.

If it's a rare thing, would you mind sending me an example sequence before and after Cenote-Taker 2 wraps it?

You can send to my email michael.tisza@gmail.com if the file is too large to attach here.

Thanks,

Mike

ldj248 commented 2 years ago

Hi

Thanks for the quick reply.

Out of my 222 DTR contigs, 83 seem to still have an overlapping ORF. None of them are annotated as Microviridae (all Caudovirales). I will send you a few sequences via email. Thank you!

mtisza1 commented 2 years ago

I've removed getorf and other emboss tools in the updated version of Cenote-Taker 2 (v2.1.5), so prodigal ORF calls are used instead. Now, the longest ORF that prodigal finds will be used as the "beginning" of circular contigs. Check out the instructions on the main repo page to update to the new version.

I hope this fixes this issue, but let me know if you find any new errors.