jaumlrc / ProphET

10 stars 10 forks source link

Installation problems #3

Closed nbenzakour closed 7 years ago

nbenzakour commented 7 years ago

Hi Gustavo,

I have read your article recently published in BiorXiv on ProphET with great interest, and I am currently trying to install it but I am encountering a few issues.

After cloning the repo, although the Readme doesn't mention it, I first tried to run the instaler_inicial.pl which I guess has to be run first. It fails with the following error:

perl instaler_inicial.pl 
Downloading Phage sequences
--------------------------------------------------
../../UTILS.dir/./retrieve_proteins.sh: line 12: URL1:: command not found
../../UTILS.dir/./retrieve_proteins.sh: line 20: $report_file: ambiguous redirect
../../UTILS.dir/./retrieve_proteins.sh: line 42: $report_file: ambiguous redirect
Extract features from sequence(s)
Error: Unable to read sequence '10662.gb'
[...]

Line 12 in retrieve_proteins.sh points to another script:

../../UTILS.dir/./fetch_genomes_based_on_taxid.pl $txid`

I guess that whatever URL fetch_genomes_based_on_taxid.pl is trying to reach, it's not working. Obsolete or malformed URL?

Out of curiosity, I have also checked if the main script ProphET_standalone.pl could be run as well. For this, I had to fix two syntax error:

The script started fine (using the test data) but failed at the Blast step, possibly because of the missing database.

perl ProphET_standalone.pl --fasta_in Arquivo_genoma --gff_in Arquivo_gff --out_in test > test.txt
NC_005362.1

Processing scaffold/chromosome: NC_005362.1 ...
Generating file containing protein and gene sequence...
Reading genes...
Reading transcripts...
Reading exons...
Reading UTRs...
Reading CDS...
Post-processing...
BLASting protein sequences against phage proteins db...
[blastall] FATAL ERROR: blast: Unable to open input file test-NC_005362.1.prot

Parsing results...
Collapsing results...
Extracting tRNA records...
Largest sequence in the FASTA files has 1992676
mv: cannot stat 'test-NC_005362.1.cds': No such file or directory
mv: cannot stat 'test-NC_005362.1.prot': No such file or directory
mv: cannot stat 'test-NC_005362.1.trans': No such file or directory

Could you check if you are seeing the same issues with this current release and could you also add some details on the installation steps (parameters to set, dependencies, etc.). Also may I also ask if you could later provide some english translation for the non-english (portuguese?) sections in the code?

Many thanks in advance. Nouri

nbenzakour commented 7 years ago

An update on some more troubleshooting:

URL1: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term=txid10662[Organism:exp]&usehistory=y
URL2: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term==txid10662[Organism:exp]&retmax=542&usehistory=y
URL elink: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=genome&db=nuccore&cmd=neighbor_history&id=54586,54585,54584,54583,54578,54577,54576,54575,54572,54551,54167,54166,50484,50439,50425,50424,50423,50422,50403,46671,46486,46456,46451,46450,46433,46399,46398,46397,46388,46380,46379,46374,46368,46362,46358,46351,46298,46297,46293,46282,46259,46258,46257,46250,46248,46245,46244,46240,46239,46237,46233,46224,46223,46218,46216,46214,46209,46208,46207,46206,46205,46204,46203,46195,46194,46193,46189,46188,46187,46186,46185,46184,46169,46158,42970,42965,42964,42959,42954,42953,42945,42935,42928,42916,42840,42809,42775,42774,42603,42596,42595,42594,42590,42587,42570,42569,42568,42567,42564,42562,42559,42549,42548,42547,42546,42545,42544,42542,42540,42537,42534,42520,42516,42511,42503,42501,42492,42477,42474,42472,42447,42445,42444,42443,42441,42435,42427,42421,42414,42410,42409,42407,42405,42382,42374,42369,42366,42364,42358,42345,42343,42342,42341,42340,42337,42326,42321,42320,42309,42170,42164,42157,42156,42150,42147,42131,42130,42125,41641,41108,40434,40427,40420,40418,40412,38677,38651,38649,38648,38647,38645,38644,38637,38634,38627,38622,38621,38620,38619,38618,38617,38616,38610,38593,38592,38588,38354,38350,38345,38303,37168,37142,37141,37140,37138,37137,36461,36448,36439,36437,36433,36430,36426,36230,36219,35605,35352,35351,35196,35195,35189,35188,35186,35185,35184,35183,34505,34501,34500,34498,34497,34493,34490,34488,34483,34082,33844,33840,33838,33837,33836,33470,33463,33458,33457,33450,33446,33389,33388,33387,33386,33385,33336,33326,33325,33324,33323,33322,33321,31668,31667,31634,31623,24566,24564,24549,24541,24522,24521,24513,24510,24381,24379,24374,24371,24354,24347,24313,24273,24269,24258,23330,23327,23307,23306,23304,23302,23301,23296,23239,23238,23123,23117,23116,23114,23112,23097,23087,22957,22540,22525,22429,22423,22422,21515,21500,18411,18410,18277,18270,18236,18071,17879,17877,17871,17870,17869,17865,17295,17293,17289,17288,16918,16820,16515,16502,16075,16048,16041,16040,16038,16037,15863,15860,15858,15854,15847,15755,15749,15542,15541,15523,15522,15518,15517,15514,15513,15512,15507,15506,15504,15501,15500,15498,15497,15496,15495,15493,15469,15467,15460,15459,15457,15456,15455,15454,15453,15452,15449,15447,15443,15441,15416,15415,15414,15411,15409,15408,15277,15272,15269,15247,15246,15245,14533,14207,14055,14048,14040,14037,14036,12292,12291,12275,12274,11935,11815,11799,11428,11192,11034,10618,10528,10371,10370,10369,10356,10354,10348,10346,10344,10343,10342,10341,10340,10339,10338,10337,10332,10331,10329,10328,10327,10296,6963,6643,6551,6547,6477,6475,6474,6455,6452,6439,6404,6401,6392,6389,6337,6332,6301,6299,6298,6297,6296,6293,6281,6280,6253,6221,6126,6081,6080,6035,6019,5999,5971,5945,5935,5920,5917,5857,5846,5810,5781,5768,5767,5741,5740,5720,5695,5674,5668,5531,5526,5525,5374,5361,5343,5338,5258,5253,4799,4794,4747,4699,4685,4682,4627,4622,4621,4618,4616,4612,4583,4553,4532,4530,4521,4518,4510,4491,4463,4446,4443,4419,4375,4359,4321,4296,4266,4258,4216,4215,4214,4212,4210,4209,4203,4202,4197,4195,4194,4178,4126,4103,4102,4101,4100,4091,4090,4080,4064,4061,4060,4059,4037,4004,3950,3936,3931,3918,3904,3884,3877,3872,3850,3846,3836,3824,3814,3733&term=srcdb+refseq[prop]&usehistory=y
URL efetch: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&query_key=&WebEnv=NCID_1_271245285_130.14.22.215_9001_1502854114_522473058_0MetA0_S_MegaStore_F_1&rettype=gb&retmode=text

No error reported but in the last line, query_key = nothing, and the created file 10662.gb is empty.

gustavo11 commented 7 years ago

Dear nbenzakour,

The BiorXiv article was released a little bit earlier than expected. I was planning to flush completely the code before the release. But that didn't happened.

I have just uploaded the most recent version of ProphET. But I please ask you to bear with me for few hours as a perform some tests. I will post here after I'm done.

Thank your for thoroughly checking the code and for trying ProphET.

gustavo11 commented 7 years ago

Dear nbenzakour,

Thank you very much for trying ProphET and reporting the errors and attempts so thoroughly.

There was an issue in the module responsible for downloading genomes from Genbank (NCBI). Its fixed now. I can send you more information about that, in case you are interested.

I have also updated README.md with required programs and libraries and instruction on how to install and execute ProphET.

We will soon plug in a module that assigns names to prophages based on self-arranging MCL of bacteriophage genomic sequences (edges based on Mummer coverage% between genomes). As we are also trying to find some time to improve the graphical visualization of results.

Please fell free to post bugs (as you did), questions and requests for new functionalities (enhancements) via the Github page.

My best

gustavo11 commented 7 years ago

I have also translated comments from Portuguese to English. Please tell me if you still find any comments in Portuguese.

I appreciate if you cold also point any part of the code that needs further clarification. I already found some files and segments that needs additional comments, but I will add those either by user demand or when I find time to revisit them.

gustavo11 commented 7 years ago

Resolved. Please indicate otherwise.

nbenzakour commented 7 years ago

Hi Gustavo,

thanks for the update.

You have an error line 119:

cp Phage_proteins_without_ABC-t.db ../database_dir;

This doesn't copy the database to $database_dir, but creates a file database_dir with the context of Phage_proteins_without_ABC-t.db. The script runs without outputting an error for the failed subsequent step (formatdb). Once corrected the installation works.

The test runs also works however I couldn't get Prophet to run on any other files. It doesn't work on gff files that contain a fasta section at the end. Once removed, I got an error about transcripts entries not having parents. Switching to a different set of files, also straight out of Genbank (GCF_001922365.1_ASM192236v1_genomic.gff, GCF_001922365.1_ASM192236v1_genomic.fna, with plasmid removed) I got this:

./ProphET_standalone.pl --fasta local.fasta --gff_in local.gff --outdir test3

Processing the following scaffolds/chromosomes:
#!genome-build
#!genome-build-accession
#!gff-spec-version
##gff-version
NZ_CP018814.1
#!processor
##sequence-region
##species

Processing scaffold/chromosome: #!genome-build ...
ERROR: The file test3/#!genome-build/#!genome-build.fasta has either more than one sequence or no sequence. at ./ProphET_standalone.pl line 201.

My fasta file has only 1 entry so not sure what the problem is.

If I may give a couple of suggestions, if your tool could directly take outputs from widely used programs like prokka, that would be a huge bonus. With regards to the results outputs, you may want to provide a description of the files generated and a more extensive description of the prophages found (positions of targets identified, etc.). As it stands now, having only the [start-end] coordinate of the prophages is not informative enough, IMHO.

Thanks for updating the text to english, but there is still two in portuguese, obtain_prot_with_annot_seq.pl and gff2graph-from-scratch.pl

gustavo11 commented 7 years ago

Dear nbenzakour,

Thank you very much for the great comments and for reporting the errors. The error on line 119 was fixed. So I'm going to close this ticket and open a ticket for every issue that you pointed and enhancement that you have requested. We will start to address those today.

One question.... We are willing to include more information about each identified prophage. Thanks for the comment. Would you mind to clarify the meaning of "positions of targets identified"?

Besides that, is there any other feature that you think would be useful if reported along with each prophage coordinates?