rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

2 nts are "cut" from the rest. #79

Closed 0xorial closed 1 month ago

0xorial commented 1 month ago

I am aligning a very simple example with exact match (see below). For some reason 2 nucleotides are separated from the rest and placed in the very end. Input and output attached. image Archive.zip

rcedgar commented 1 month ago

I guess your question is "why and is this a bug". why = choice of terminal gap penalties (or here, HMM transition probability into Exit state, which is essentially the same thing). In this case, muscle prefers the mismatch and internal gap over opening a terminal gap. For context, muscle is a global aligner, not a local aligner, and here you have a fragment which aligns locally. is this a bug = no of course this example looks "wrong", but for any choice of gap penalty schemes you can always find "obviously wrong" cases. such is life when algorithms use highly simplified an unrealistic models of evolution.

0xorial commented 1 month ago

hehe, that was indeed my question. thanks for clarifying!