theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

[TheiaProk] Add detection of prophages via phastaf #460

Open cimendes opened 1 month ago

cimendes commented 1 month ago

:cool:

:pushpin: Explain the Request

For some bacterial species, such as vibrio cholerae, the location of prophages is important to know. Currently, there's not a lot of tools to do this. The most famous one is PHASTEST (https://phastest.ca/).

:books: Context

Unfortunately, I tested PHASTEST on a vibrio genome and I got back no results (https://phastest.ca/submissions/ZZ_b98f647eea). I should have at least one hit, the CTX prophage (confirmed present via abricate and srst2)

@tseemann has developed https://github.com/tseemann/phastaf, which is a diamond wrapper around the PHASTEST database. It is very fast and the database comes prepackaged with the tool. It's available through bioconda. I ran it in the same genome as my previous test and as part of the many hits that were generated, the CTX prophage was identified in 3 contigs (contig00001, contig00037 and contig00049)

image

I don't fully understand why the results of the two tools were not comparable, but maybe integrating prophage detection is worth some research. Maybe we can improve upon phastaf to decrease the number of false-positive results.