Open GoogleCodeExporter opened 9 years ago
Hi,
Thanks for an interesting issue! ;)
----
3. I re-run the same command again and I got now 590Mb all.abc file. I find
this very strange. I haven't met such problem yet. Is there a way to know if it
worked now properly. For this, I get number of unique protein names using the
1st column of proteins file, which is 228877 and number of genes in fasta file
is 228879. So, it looks like OK now. I obtained gene count in a fasta file as
following: grep '>' all_genes.fasta|wc -l and number of genes in proteins.map
was obtained as following: cut -f1 proteins.map|sort|uniq|wc -l .
----
-- If I understand you correctly, the same call to orthAgogue result in
different results. (If this interpretation of your words is wrong, then please
give me a word.) Assuming my understanding (of your words) is correct, this
indicates that there are errors in the parallelisation (in orthAgogue). To help
investigate the issue (which I've now marked as critical), may you:
(1) try different numbers of CPUs, and
(2) if the result yealds different results, then
(a) download the source,
(b) install orthAgogue using the "install_debug.bash" script,
(c) try different numbers of cpu's, and then send me the results found in
report_orthAgogue/ ?
-- If you could, then I hope to identify where the bug is found (eg, in the
parsing or in the computation of putative orthologs.
-------------------------
I expected somewhat large file after the 1st run, but then in the 2nd run I get
larger file. Don't know if orthAgogue worked correctly now.
-------------------------
-- the results seems strange, though finding bugs in parallel applications are
always difficult, ie, I'd be thankful if you could help me solving this issue.
-----------
Please provide any additional information below.
How can I check if orthagogue worked properly? Can I use number of genes in
proteins.map file to check if it equals to number of genes that were used in
blastp alignment? Because it looks OK for this project in another project I run
orthagogue earlier number of genes in fasta file was 460565 but number of
protein ids in proteins.map file was 456393. Here I see about 4000 genes
missing proteins.map and for this project more genomes were used, therefore
resulting blastp file was around 21Gb. Thank you for all.
-----------
-- Thanks for your question: regarding the causes and effects of the bug, we
will first know it when we've investigated the issue. My first assumption is
that the bug is found in the parsing, for which the result will look correct,
ie, investigating the result-file may be of no help if the bug is found in a
different part of the orthAgogue pipeline.
-- in order to get an idea of where the bug is found (and its effect), if you
run the software after having installed it using "./install_debug.sh", then the
file "report_orthAgogue/list_file_parse.log.*" will give you the number of
relations before filtering with respect to orthologs, while
"report_orthAgogue/taxa_list.log" gives a generalized summary of the result: my
hope is that a comparison of the case with one cpu VS the other 'cpu cases'
will help us to identify the locaiton of the bug.
Again many thanks for making me aware of a possible bug in the parallelisation.
Hope this answer at least clarified some points in your issue: looking forward
for your feedback, and again many thanks for your help posting it!
PS: There might be other reasons for this error, so if you could regard my
assertions as an initial hypothesis, I'd be thankful ;)
Best,
Ole Kristian,
Developer of orthAgogue
Original comment by oeks...@gmail.com
on 18 Oct 2014 at 4:34
Dear Ole,
Thank you for all and especially for your very quick answer. Apologies for
delay, had a vacation, other tasks and, especially, I couldn't reproduce this
issue.
Yes that was the correct interpretation of what I said. I think it was a
problem on my system, because there was several other heavy-duty tasks were
also running therefore it could have been killed by a system or there were not
enough resources (memory, HDD, etc.). Anyway I can't reproduce it, so I assume
this issue is closed.
I wanted to inform you that I am going to use the following criterion to decide
whether orthAgogue worked correctly or not. The criterion is: "the number of
proteins in proteins.map file must be equal to or slightly (only couple of
genes) less than the total number of proteins that are present in all genomes
that were used in orthology prediction".
Thank you for this fast program.
Best regards,
Juma
Original comment by jbay...@gmail.com
on 11 Nov 2014 at 12:01
Original issue reported on code.google.com by
jbay...@gmail.com
on 18 Oct 2014 at 1:18