Closed paoloczi closed 5 years ago
-d <path>
.I am attaching a test program that shows the common prefixes/suffixes not being preserved, and also shows two dramatically different alignments depending on ordering. The program runs two separate global alignments in which the same 5 distinct input strings are used. Each of the 5 strings is used the same number of times in the two alignments, but in different orders. The 5 strings all have a common 10-base prefix and a common 10-base suffix. The prefix/suffix are not preserved in the alignment, and the alignments computed with the two orderings are very different. See comments in the C++ code for more details.
I use the library, not the executable. Is the -d
option equivalent to Graph::print_dot
?
I see now that Graph::add_alignment
has the weight as an optional argument. In my usage case, many of the input strings occur multiple times, so I could call add_alignment
once for each distinct string, specifying as the weight the number of times each string occurs. Do you expect that this will improve performance? Or is the compute cost dominated by the final pass over the graph in generate_multiple_sequence_alignment
?
Based on what I know now, my inclination would be to call add_alignment
for each of my distinct sequences, in order of decreasing frequency (without regards to length), and specifying the weight as the frequency. By presenting the most frequent sequences first (like in the first ordering used in the test program), my hope is that those sequences get a better chance to determine the final alignment.
Yes, option -d
evokes Graph::print_dot
.
If you beforehand known which sequences are equal (as this example), you should just add weights to them equal to their frequency which will decrease the total execution time significantly (time needed for consensus and MSA generation is negligible).
I noticed that you are using local alignment (kSW) in your code instead of global alignment (kNW) which should partially solve your inquiry (see bellow). I used the latest version (v2.0.3
) which produces a bit different alignments than the one you are using (in backtrack, matches/mismatches are checked first).
Computed alignment for ordering 0:
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-------ACACACAGACAC
-------------------CACA-T-----ACACACACAGACAC
-------------------CACA-T-----ACACACACAGACAC
-------------------CACA-T-----ACACACACAGACAC
-------------------CACA-T---------ACACAGACAC
-------------------CACA-T---------ACACAGACAC
-------------------CACA-T---------ACACAGACAC
CACATACACA-GACATACACACACTGACACACACACACAGACAC
CACATACACACG-C--ACACAC------------ACACAGACAC
Computed alignment for ordering 1:
CACATACACAGACATACACACACTGACACAC--ACACACAGACAC
------------------CACA-T---------ACACACAGACAC
------------------CACA-T---------ACACACAGACAC
CACATACAC--AC--GCACAC------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT--------------------------ACACACACAGACAC
CACAT------------------------------ACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT--------------------------ACACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT------------------------------ACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT--------------------------ACACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT----------------------------ACACACAGACAC
CACAT------------------------------ACACAGACAC
Thank you so much for pointing out my error. So I will plan on doing the following:
kNW
instead of kSW
.add_alignment
once for each distinct input sequence, specifying the appropriate weight for each.This was a valuable discussion for me - thank you for your help and attention!
No problem!:)
Hi rvaser,
I just wanted to clarify how the weights are used: is it true that the alignment itself will not be affected by the weights? From what I understand, weights are used for consensus and quality.
Hello Ryan, yes, the weights are only used to generate a consensus from the graph.
Best regards, Robert
Ok thanks for the clarification
This is not a description of an issue found with the repository - just some questions about the multiple alignment functionality: