mtisza1 / Cenote-Taker2

Cenote-Taker2: Discover and Annotate Divergent Viral Contigs (Please use Cenote-Taker 3 instead)
MIT License
56 stars 7 forks source link

Getting phage contigs even after "phage pruning" is set to true #30

Closed eul2021 closed 2 years ago

eul2021 commented 2 years ago

Hi Mike, I just wanted to clarify if I'm expected to see some phage contigs even if I set phage pruning to True. I read your notes (https://github.com/mtisza1/Cenote-Taker2/wiki#notes-on-virus-taxonomy ) on getting caudovirales not pruned. But I'm also getting varsani microvidiae and some other phages. Would you say those could be contigs where phages integrated into viral genomes? Or something else?

Thank you very much for your help.

mtisza1 commented 2 years ago

Eul2021,

Thanks for the question. You are misunderstanding the function of the --prune_prophage argument. In fact, this module is designed to KEEP the phage sequence and remove/prune away the "non-phage" chromosomal parts from contigs originating from bacterial chromosomes (many phage are integrated into bacterial chromosomes). Illustration:

Original contig:

^^^^BACTERIAL CHROM^^^^^\/\/\/\/\/\/PHAGE GENOME\/\/\/\/\/\/\^^^^BACTERIAL CHROM^^^^^

"Pruned" sequence after pruning module:

\/\/\/\/\/\/PHAGE GENOME\/\/\/\/\/\/\

If you set --prune_prophage to true, it tries to prune EVERY linear sequence (that contain virus hallmark genes). Also, it only "tries" to prune sequences over 10kb as smaller sequences are trickier.

This module probably works for eukaryotic viruses that are integrated into host chromosomes, but I haven't tested it extensively.

The scenario where a phage is integrated into a eukaryotic virus is pretty unlikely.

I think I didn't communicate this directly enough, so I'll have to explain this better in the README.

mtisza1 commented 2 years ago

Also, some/many contigs containing phage sequences will not have any bacterial chromosome on the flanks. These sequences will be looked at by the pruning module, but they won't get pruned or changed.

eul2021 commented 2 years ago

Hi Mike, Thanks so much for the clear explanation. I get it now:)