linsalrob / PhiSpy

Prediction of prophages from bacterial genomes
MIT License
70 stars 20 forks source link

Minor changes and suguestions #28

Closed metageni closed 4 years ago

metageni commented 4 years ago

Hi PhiSpy team, I contacted @linsalrob directly and he told me to write my feedback here.

1) I noticed that when Phispy v3.4.5 is installed, it adds to the path Phipy.py, but it also adds Phispy and phispy (which don't work). It can be a little misleading even for someone who read the README.me. I think this is a simple fix in the setup.py file. Fixed by @linsalrob 2/24/20

2) prophage.tbl has two definitions in the README.md file: One states two columns without a header (the one that show in the output), and 2) it shows the file with a header and nicely split into three columns. I believe the second one if possible would be nice to have. Fixded by @linsalrob 5/20/20

3) The README states that If the column contains a 0 we believe that it is a bacterial gene and in another section it says: If this column is 1 then the gene is a phage like gene; otherwise it is a bacterial gene.

It is a little ambiguous to understand what it is when value > 1. I believe >=1 is a phage region - I saw some regions annotation with "phage" as substring and column 10th > 1. Maybe clarify the README.me. Fixded by @linsalrob 5/20/20

4) Add scikit-learn >=0.21.3 to Software Requirements. I believe numpy is not showing because it will be installed when installing Biopython ?! Fixed by @linsalrob 2/24/20

5) I believe that that prophage.tbl seems to contain the contig name, start and stop for the phage genome rather than the phage genes, right?

The tool takes a gbk file as input. I believe that most people that have a gbk would also have a FASTA file. I know that the tool does not take the FASTA file, but maybe the user could optionally pass it (if wanted) and a FASTA output file with be generated with the phages' genomes. I'm thinking about the end-user who does not know how to parse a FASTA file to get the phage regions. If the program does it, I'm sure people will love it.

I'm working in a blog post (http://onestopdataanalysis.com/) which will be posted a few months from now on how to go from bacterial FASTA to phage FASTA (Docker image with prokka and phispy, use the phispy output to and parse the phages FASTA file).

I will write a script to do the parsing and would be more than happy to add to the code base if given access. Otherwise, I'm also happy to share it with the team.

Thanks

linsalrob commented 4 years ago

Regarding the last point,

I believe that that prophage.tbl seems to contain the contig name, start and stop for the phage genome rather than the phage genes, right?

The tool takes a gbk file as input. I believe that most people that have a gbk would also have a FASTA file. I know that the tool does not take the FASTA file, but maybe the user could optionally pass it (if wanted) and a FASTA output file with be generated with the phages' genomes. I'm thinking about the end-user who does not know how to parse a FASTA file to get the phage regions. If the program does it, I'm sure people will love it.

We currently require a GenBank file as we base some of the decisions on the annotations, of course. There are lots of ways to convert a fasta file to an annotated GenBank file. We like the PATRIC service or PROKKA for standalone annotation. (Perhaps these are worthy of a http://onestopdataanalysis.com/ blog post!

We are also providing GenBank format output now.

All other suggestions have been implemented!

Thanks