samyeaman / orthagogue

Automatically exported from code.google.com/p/orthagogue
Other
1 stars 1 forks source link

The program does not start the analysis for "non-sorted Blast input file" #8

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. running orthAgogue on 4 Gb blast output file
2. orthAgogue -i Results_blast.txt -s _ -u -o 50

What is the expected output? What do you see instead?
 I expected to have the results .abc or .mci but instead I obtained the following error messagge:

!!   Hash was not constructed, probably due to rececing a non-sorted Blast input 
file.
-    The execution-time will therefore be severly hampered;-     We would be utmost 
thankful if you would forward this message to the developer,
     either through orthAgogue's issue-page (at our home-page), or directly to the developer at [oekseth@gmail.som].
This message was printed at 
[buildHash]:/home/klatremus/Dokumenter/Work/code/orthAgogue/src/blast_common/pro
t_list.h:102

What version of the product are you using? On what operating system?
version = orthAgogue-1.0.2 (debian)
OS = ubuntu 12.04

Please provide any additional information below.

I used to run orthAgogue without problems in other similar size blast files. I 
do not really understand why it does not work now.

Original issue reported on code.google.com by mirossi...@gmail.com on 25 May 2015 at 7:45

GoogleCodeExporter commented 9 years ago
Hi,

Thanks for your error-report: regarding the problem, the error-messages states 
that the input-file was not sorted. If we for simplicity assumes that the 
observation (made by orthAgogue) is correct, would you be able to sort your 
file (wrt. the names), such as using the following terminal-commando:
---
sort --ignore-nonprinting -k 1,1 -k 2,2 -u -t$'\t' --buffer-size 1000000000 -f 
<input-file> -o <output-file>
---

and then re-run orthAgogue, reporting any new error-message?

PS: from my own experience of debugging, it sometimes takes a few 
assumptions/approaches before the problem is solved, ie, if this does not help, 
then I'll dive more deeply into the problem.

Looking forward your your reply!

Original comment by oeks...@gmail.com on 25 May 2015 at 9:41

GoogleCodeExporter commented 9 years ago
Hi,
thanks a lot for the tip. It works. Just a last question. I did not see a lot 
of differences between the Blast results I procuded early. Do you know why this 
time the BLAST result file was "unsorted" for the point of view of OrthAgogue? 
thanks a lot again.
Best regards
Mirko

Original comment by mirossi...@gmail.com on 26 May 2015 at 6:44

GoogleCodeExporter commented 9 years ago
Hi,

Good to hear that it is now roking!

Regarding your question of the BLAST results, my own experience is that BLAST 
has a set of minor errors (when comparing it to its format-specification), 
errors which are rare (and is therefore hard to spot if you not write a script 
searching for them). It is these errors (in BLAST) which caused your original 
issue.  In this context, as I (earlier) have spent some time doing some 
introspection upon this issue, my own assumption is that they apply 
parallisation using an simplification (or 'assumption' which is the term more 
acceptable to users) during result-concatenation, an assumption which does not 
always hold. 

Hope this brief explanation clarified your thoughts, ie, that it did not make 
you more puzzled about programs, parallisation and 'pretty rare errors' (such 
as those in Blast, or any other program applying complexity on a large scale).

Best,

Ole Kristian

Original comment by oeks...@gmail.com on 26 May 2015 at 12:50

GoogleCodeExporter commented 9 years ago
Dear Ole
really sorry
I just believed in the files from my colleague. I found the problem, it was
just the replication of few proteins.
Please ignore my previous Email.
Thanks any case for the previous help.

Original comment by mirossi...@gmail.com on 31 Jul 2015 at 9:48