mtisza1 / Cenote-Taker2

Cenote-Taker2: Discover and Annotate Divergent Viral Contigs (Please use Cenote-Taker 3 instead)
MIT License
55 stars 7 forks source link

any option to use continue mode #5

Closed Thexiyang closed 3 years ago

Thexiyang commented 3 years ago

Thanks for this nice tool. It looks the whole process can be very slow. Is it possible to set a flag for continue mode? So we can restart the process where it stops rather than to start from the beginning.

mtisza1 commented 3 years ago

Hi Thexiyang,

Thank you for another very useful suggestion. A "--continue" flag might be possible, and I can envision how I might implement such a system. As with your previous comment, it would take some work to implement and test this. I'll put it on my "To-do" list.

In the meantime, if you'd like, I can make suggestions for running your jobs in a shorter amount of time, if you let me know what your inputs are and what you'd like as outputs. All the steps (e.g. HHblits) may not be necessary for each implementation.

Best,

Mike

Thexiyang commented 3 years ago

Thanks for the swift answer. I will check this step: HHblits. It is indeed time-consuming for this step. One more question about how to interpret the data. It is missing in the wiki on how to understand the output. Any suggestion about how to convert gbf ifle to the map mentioned in the preprint paper, e.g. Figure 2? I thought it was in the output...

mtisza1 commented 3 years ago

Yes, please try "--hhsuite_tool none". I guess you have a large metagenome and you are hunting for virus contigs? Using "--hhsuite_tool none" can speed up this process a lot, and you can take any interesting genomes/contigs and put them back into cenote-taker 2 using "--hhsuite_tool hhblits" to improve their annotations.

(However, if you are putting in whole bacterial genomes with contigs of several megabases to find prophage, this will simply take 2-4 hours as the parallelization is not (yet) optimized for these data. Results should be good though.)

Regarding visualizing genome maps, you can use any genome/plasmid viewer to open the .gbf files directly. Free softwares include SnapGene Viewer and Ugene. Paid softwares include Geneious and MacVector. There are probably others. I'm not sure if I have a particular preference. Let me know if that is helpful. I'll add this info to the wiki.

Thexiyang commented 3 years ago

Very helpful! Thanks a lot!