smarco / WFA2-lib

WFA-lib: Wavefront alignment algorithm library v2
Other
162 stars 36 forks source link

Difference between GapAffine(1,1,1) and EditDistance #51

Closed RolandFaure closed 1 year ago

RolandFaure commented 1 year ago

Hi Santiago,

Still working with WFA-lib, I noticed a behaviour that I do not understand on wfa::WFAlignerGapAffine. It seems to me that it did not find the optimal alignment in the example below (so I probably misunderstand the parametrization). I thought that wfa::WFAlignerGapAffine alignerAffine(1,1,1, wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh); was equivalent to wfa::WFAlignerEdit alignerEdit(wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh);, but it does not seem to be the case :

wfa::WFAlignerGapAffine alignerAffine(1,1,1, wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh);
wfa::WFAlignerEdit alignerEdit(wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh);

string seq1 = "AAAAAAAAAATAGCGAAAATGCCCAGCTATTTTTTTTTTT";
string seq2 =   "AAAAAAAAAACAGCGAAATGCCCCAGCGATTTTTTTTTTT";

alignerAffine.alignEnd2End(seq1, seq2);
alignerEdit.alignEnd2End(seq1, seq2);

string cigarEdit = alignerEdit.getAlignmentCigar();
string cigarAffine = alignerAffine.getAlignmentCigar();

cout << "Alignment with edit distance: " << cigarEdit << " " << alignerEdit.getAlignmentScore() << endl;
print_alignment(seq1, seq2, cigarEdit, 0, seq1.size());
cout << "alignment with gap-affine (1,1,1): " << cigarAffine << " " << alignerAffine.getAlignmentScore() << endl;
print_alignment(seq1, seq2, cigarAffine, 0, seq1.size());

gives :

Alignment with edit distance: MMMMMMMMMMXMMMMMMMDMMMMMIMMMXMMMMMMMMMMMM 4
AAAAAAAAAATAGCGAAA-ATGCCCAGCTATTTTTTTTTTT
AAAAAAAAAACAGCGAAATGCCCC-AGCGATTTTTTTTTTT
alignment with gap-affine (1,1,1): MMMMMMMMMMXMMMMMMMXXXMMMMMMXMMMMMMMMMMMM -5
AAAAAAAAAATAGCGAAAATGCCCAGCTATTTTTTTTTTT
AAAAAAAAAACAGCGAAATGCCCCAGCGATTTTTTTTTTT

Apparently the gapAffine aligner favored 3 mismatches over one insertion and one deletion, which I do not understand. Do you know why it behaves like this ?

Thank you, Roland

smarco commented 1 year ago

Hi,

Using gap-affine (1,1,1) you have a gap-opening cost. Therefore, one insertion and one deletion score (1+1)+(1+1) = 4 whereas three mismatches have score 1+1+1=3. Under edit distance (i.e., (1,0,1)), both alignments score 3.

Let me know if that makes sense to you.

RolandFaure commented 1 year ago

Yes, thank you very much.