Open GoogleCodeExporter opened 9 years ago
Hi,
Thanks for both an interesting issue-report, and the superlatives of our tool!
Regarding your question, I have requested some comments from one of my
co-authors, Pof. M. Kuiper, which may below answer is based upon. Based on the
comments made by PRof. M. Kuiper, our first impression (which may be wrong, ie,
as it is a first impression ;) ), is that the subtleties which you have
observed, could stem from tools either used to construct the input or tools to
analyse the result of orthAgogue.
From your issue-report, we infer that you for one particular gene have
identified subsequent co-clustering which is not as it should be. Given this
assumption (of correctly understanding your issue-report), then it does not
join a larger set of similar genes from eukaryotes (to which it has a higher
homology), but rather joins a prokaryote cluster.
Therefore, without knowing the actual levels of homology of these genes, it is
difficult to say what is the problem, ie, if there is any. If the homologies
themselves are pointing to these cluster memberships, than a strategy could be
investigating the results produced by BLAST. If the homologies indeed would
point to different cluster memberships, than it would sound a bit odd if (and
how) orthagogue can overrule that. Therefore, if the latter seems to be the
case, may the problem lay in MCL?
Given these thoughts, it would be interesting to get your feedback, ie, to both
resolve the issue, and (hopefully) to acquire some knew knowledge (of
approaches in the field).
Best,
Ole Kristian Ekseth,
developer of orthAgogue
Original comment by oeks...@gmail.com
on 17 May 2015 at 12:23
Thank you for the quick response. Here I elaborate on the parameters used in my
analysis
#BLASTp STEP
blastp -outfmt 6 -evalue 1e-5 -word_size 4 -threshold 18 -seg 'yes'
-max_target_seqs 100000 -dbsize 2543962
#HERE DBSIZE REFERS TO THE NUMBER OF PROTEINS IN MY FASTA FILE
#OrthAgogue STEP ON THE BLAST OUTPUT
orthAgogue --seperator '|' --cpu 16 --overlap 50 --use_scores
#MCL STEP ON THE ORTHAGOGUE OUTPUT
mcl --abc -I 1.5
The programs were run on a linux machine (Ubuntu 12.04 LTS) with 4 Intel Xeon
1.2 GHz QuadCore processors and 256 GB RAM. OrthAgogue took approximately 22
minutes to finish computation using a maximum of 130 GB of RAM and all
processing cores. Since the BLASTp output generated is huge (127 GB) I am not
able to share it with you. However I would like to specify my problem with the
following attached files.
I have a diatom protein (ID: 565099) and a bacterial protein (ID: 1055246)
which are reported to be present in the same cluster (post processing
OrthAgogue output with MCL). The proteins are retained in the same cluster even
after using different inflation parameters in MCL (1.2-1.5). Further when I
look into the raw all.abc OrthAgogue output I find that for the bacterial
protein a weight is assigned against the diatom protein even though it has much
higher score hits against proteins from other organisms like metazoans (ID:
21263).
Please find attached the BLASTp output for the diatom and the bacterial
proteins as well as their subset of results from the all.abc output file. I
hope it can help you. Ideally if you can provide me a way to send my BLAST
results then it would be perfect.
In the end I must say that such problems are present only for a small
population of my proteins, in fact majority of the protein clusters are well in
accord with known taxonomy. My suspicion at the moment is that if the large
number of proteins from bacterial phyla are creating a problem. Since to reduce
complexity I had clustered all bacterial proteomes within a each phyla as
single dataset with CD-Hit. Hence all proteins of a bacterial phyla say
firmicutes are perceived as a single organism by orthAgogue.
Original comment by projectb...@gmail.com
on 19 May 2015 at 10:20
Attachments:
Original issue reported on code.google.com by
projectb...@gmail.com
on 15 May 2015 at 2:45