replikation / What_the_Phage

WtP: Phage identification via nextflow and docker or singularity
https://mult1fractal.github.io/wtp-documentation/
GNU General Public License v3.0
100 stars 16 forks source link

pro phage detection #73

Open mult1fractal opened 4 years ago

mult1fractal commented 4 years ago

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0223364

replikation commented 4 years ago

see also #11

mshamash commented 4 years ago

Some suggestions for milestone v0.9 prophage prediction:

Re. VirSorter: important to include only category 4 and 5 predictions, not 6. Similar to lytic phages with categories 1 and 2 being best "hits" while category 3 is likely to include many false positives (as per original VirSorter publication).

Since some tools "die a bit" when you include a full bacterial genome (2Mb++ in size), perhaps a warning can be issued to the user if any input files/contigs are that large? WtP could suggest that the user run it again with the corresponding flag(s) to deactivate those tools. I believe that allowing a bunch of bacterial genomes or bacterial metagenomic bins to be used as input is very useful for predicting prophages en masse for a given dataset.

I'll update this comment if I think of anything else, but this is my feedback for now.

hoelzer commented 4 years ago

Yes, category 4 and 5!

In this context, it might be also interesting to note that there was a bug in VirSorter predicting prophages that are actually longer than the input contig: https://github.com/simroux/VirSorter/issues/68

It is fixed (in the master branch of VS I guess, but likely not in the current release/docker/bioconda.)

mshamash commented 4 years ago

Yes, category 4 and 5!

In this context, it might be also interesting to note that there was a bug in VirSorter predicting prophages that are actually longer than the input contig: simroux/VirSorter#68

It is fixed (in the master branch of VS I guess, but likely not in the current release/docker/bioconda.)

Yeah I always use the master branch of VS, since I think the docker/bioconda builds are a bit outdated. Not sure if that's something that can be done feasibly with Nextflow...

There's also VirSorter2 which may be coming out soon? https://github.com/jiarong/VirSorter2 Although prophage prediction has been really hit or miss with it thus far, I imagine they're optimizing it for lytic phage detection first.

hoelzer commented 4 years ago

True, it's possible to build an own docker image of the current VS master for WtP... but this @replikation and @mult1fractal have to decide ;)

Ah and thanks for the link to VS2! I was aware that there will be a version 2 soonish but did not know about this code repository.

replikation commented 4 years ago

@mshamash identification tools that die on the input data will be terminated and automatically excluded from the results and figures. So its more annoying than critical as the workflow continues without that data.