replikation / What_the_Phage

WtP: Phage identification via nextflow and docker or singularity
https://mult1fractal.github.io/wtp-documentation/
GNU General Public License v3.0
103 stars 15 forks source link

Add Phigaro? #88

Closed chris-rands closed 4 years ago

chris-rands commented 4 years ago

Nice tool and idea. Perhaps our phigaro tool could also be included (although I appreciate that adding new tools is probably not trivial): https://academic.oup.com/bioinformatics/article/36/12/3882/5822875

https://github.com/bobeobibo/phigaro

chris-rands commented 4 years ago

Ah closing as a duplicate of #1 and #75

replikation commented 4 years ago

ah yep, you were faster, thanks for the interest and suggestion :)

replikation commented 4 years ago

@chris-rands the most time-consuming part is writing a parser for the tools and check/validate the tool output so we would need: 1) a working dockerfile / container (for both docker and singularity) 2) a parser which gets us a simple list of contigs(one contig per line) that are considered "phage positive"

PollyTikhonova commented 4 years ago

Hello, I am the Phigaro dev. I wanted to ask, what do you mean when you say contigs? Is it something that you transform initial data in? or is it a prophage region which phigaro detects? I was trying to understand it on my own, but now I am meeting with some issues with your tool, so maybe it would be faster if you could consult me)))

replikation commented 4 years ago

hi, thank you for the interest. the error you reported (before editing) was probably from the newest nextflow release, please use for Release v0.8.1 [NEXTFLOW version 20.01.0 or 20.04.1]. nextflow 20.07.x will be supported in our next release (currently testing that).

I mean fasta entries/names in a multifasta file. WtP reports which of these fasta entries are phages.

It would help if you could give us a good "best practice" command to identify which fasta contains phages/prophages. And of which result file we could extract easily the names of these positive phages (the fasta header, e.g. >fasta_1).

we could then include phigaro a bit more conveniently for us :)

PollyTikhonova commented 4 years ago
  1. About the error - it was my local mistake - I found it and now, it seems as it's working.
  2. The way Phigaro works: it finds the coordinates of prophages in a fasta sequence: so, theoretically, Phigaro can find several phages in one fasta or detect more narrow phage region than the whole fasta. Do I understand correctly that you do not need any of this information (neither coordinates nor the number of phages for each fasta seq)?
  3. I can produce any output you need, and I would be glad if you could provide the example of the file for a specific fasta file (maybe from one of those in your test example https://github.com/mult1fractal/WtP_test-data) - so you won't need to reparse it or transform anyhow. Because now, I have so many little questions, like do you need the whole header or only id\name, should the symbol > be removed, do you need a specidfic name/extension of the file or maybe you need stdout?
PollyTikhonova commented 4 years ago

Oh! WTP just finished the work (with a ruin((( ) but still, it managed to produce some data, so do I understand correctly that you need a file like those in a folder identified_contigs_by_tools ?

replikation commented 4 years ago

ad.1 glad to hear its working

ad.2 this information is currently not needed but as it seems useful, I think we will add that too at some point (i keep that in mind for now). the complete "raw tool output" is either way completely stored too so the User can check specifically the output from your tool if they want to also we are currently discussing how to best implement prophage detection for the user to be as convenient as possible, so the coordinates of prophages in a fasta sequence would be useful for that.

ad.3. no, we don't need a specific output files, can be any file e.g. output.txt ( we need to rename it either way). This needs to contain a list of header names, without ">", one fasta header name per line (unmodified from the original input)

e.g.:

fasta_1
fasta_2
fasta_5

yep you can use https://github.com/mult1fractal/WtP_test-data

Thank you very much

PollyTikhonova commented 4 years ago

Ok, I understand! Thank you very much for your responses)))

PollyTikhonova commented 4 years ago

Hello, I did everything you asked for:

  1. Dockerfile and Singularity file. 1.1 About Singularity: For Singularity a couple of paths (PATH and HOME) should be exported, you can find them in the Singularity file. So, you can built a Singularity image from the file or from docker repository and export the paths after. Or you can omit the exporting path step and use the full path Phigaro calling (with specifying a full path to a config file) - I represent the codes below.
  2. Parser
    • You can call Phigaro in a short way (if all paths (PATH and HOME) are installed correctly) \ phigaro -f [PATH_TO_INPUT_MULTIFASTA] -o [PATH_TO_OUTPUT_FOLDER] -t [THREADS (OPTIONAL)] --wtp
    • Or you can call Phigaro in a full path way \ /root/miniconda3/bin/phigaro -f [PATH_TO_INPUT_MULTIFASTA] -o [PATH_TO_OUTPUT_FOLDER] -t [THREADS (OPTIONAL)] --wtp --config /root/.phigaro/config.yml \ It will create the folder that you specify with -o (if it doesn't exist), and there you will find the phigaro.txt file with the names of the phage containing fasta sequences and the folder with standart phigaro outputs.

_By the way, I noticed that What_ThePhage changes the names of the phage positive sequences - at least it replaces the last dot with a dash. But Phigaro does not change anything, as you asked.

  1. Database download location All external files, Phigaro downloads during the setup process and stores in ~/.phigaro/pvog, are available by the link.\ But I am not sure if you really need this because Phigaro is fully setuped in a Docker container with all external files.

Please, let me know if you have any problems or you need something)

mult1fractal commented 4 years ago

Hej, many thanks for your great effort! We will take care of the rest šŸ‘ I will start integrating Phigaro on Monday

mult1fractal commented 4 years ago

@PollyTikhonova : Phigaro is basically fully integrated into WtP I just have to update some minor issues in my pull request #98 and we will release a new version of WtP with your tool šŸ‘ Thanks to your effort it was really easy!

considerations:

PollyTikhonova commented 4 years ago

@mult1fractal, Very happy to hear that! We'll wait for your release))) Sorry, I am not a very experienced Docker user, so I need to ask what do you mean "raw docker" ? How raw should it be? As far as I remember, the initial phigaro image was built based on somebody's hmmer docker container and I think that's not you are asking for, am I right?

mult1fractal commented 4 years ago

No Problem I can take care of it šŸ‘ For the moment, it's fine with your Dockerfile. When I updated my PR I think we will release the new Version The Dockerfile we will maintain is a long term goal when Dockerhub will update its policy

RUN pip install phigaro



Database:
* We download the database once. 
  * First execution of WtP - download all databases 
  * every other execution databases are available and you can tell WtP where the databases are located on your system

* eg. : the database for Phigaro will be downloaded once
  *  in nextflow we can basically "mount" the database into the Phigaro module which is the Dockerfile in general

* But this is something I have to test (Dockerfile with database or Dockerfile without database  and "mounting the database" )
* A  ```--wtp /path/to/database``` would be ideal for the Dockerfile without database in terms of speed (but this is sth. I have to discuss with @replikation )
PollyTikhonova commented 4 years ago

@mult1fractal

mult1fractal commented 3 years ago

@PollyTikhonova Hey I'm currently preparing our WtP preprint for submission. I was wondering how you filter the phage positive sequences for the --wtp output (above which (and what) cut-off it is a phage-sequence)

e.g. Tool Criteria Filter
Virsorter Category of detection (1, 2 or 3: intact, incomplete or questionable) Category 1 & 2
Phigaro ? ?
PollyTikhonova commented 3 years ago

While the wtp option Phigaro is working in its basic mode, so the thresolds for the final scoreĀ are ā€¢ Minimum score threshold: 45.39 ā€¢ Maximum score threshold: 46.0

Briefly, theĀ final score is computed as a product of phage concentration score and triangled GCĀ content score [each of the scores were filtered by indicator function].

mult1fractal commented 3 years ago

Thank you very much:)