Open chchch opened 3 years ago
Hello - just had a quick look on https://web.stanford.edu/class/cs262/presentations/lecture2.pdf about affine gaps. It suggests that one needs to account the best score for a given (i,j)
pair separately for the cases if the gap is already open or not.
I can imagine that would mean adding a third, binary dimension to the coordinates, like (i, j, Open|NotOpen)
, and then when moving horizontally/verticall, you move from (i, j, NotOpen)
to either (i-1, j, Open)
. When moving diagonally, you move from (i, j, _)
to (i-1, j-1, NotOpen)
.
Just a hunch. Not sure about the reference, please excuse me. Also, the code indeed could have more type signatures / guidance.
Hi,
It seems that you actually need to save three scores at each step, rather than just the max score. Here's the algorithm from Gusfield 1997 Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, 244:
This is my (pretty ugly) attempt at implementing it... I'm not so good with monads and do-notation:
Trace a b `tappend'` Trace y (z:_) = Trace (a+y) (z:b)
data AlignConfig' a s = AlignConfig'
{ ac'PairScore :: a -> a -> s
, ac'_initial_gap_penalty :: s
, ac'_gap_penalty :: s
, ac'_gap_opening_penalty :: s
}
alignConfig' :: (a -> a -> s) -- ^ Scoring function.
-> s -- ^ Initial gap score.
-> s -- ^ Gap score.
-> s -- ^ Gap opening score.
-> AlignConfig' a s
alignConfig' = AlignConfig'
affineAlign :: (G.Vector v a, Num s, Ord s)
=> AlignConfig' a s
-> v a -- ^ Left sequence.
-> v a -- ^ Right sequence.
-> Trace a s
affineAlign AlignConfig'{..} as bs =
let p = (lastIndex as, lastIndex bs)
in revTrace . fst $ evalState (go p) M.empty
where
revTrace (Trace s t) = Trace s (reverse t)
lastIndex v = G.length v - 1
--
go p = do
res <- gets $ M.lookup p
case res of
Just r -> return r
Nothing -> do
newRes <- pgo p
modify (M.insert p newRes)
return newRes
--
pgo (i,j)
| i == (-1) || j == (-1) = return $
if i == j then (Trace 0 [],(Trace 0 [],Trace 0 []))
else if i == (-1)
then skipInit j stepRight bs
else skipInit i stepLeft as
| otherwise = do
let a = as G.! i
b = bs G.! j
diag <- go (i-1,j-1)
let diag_max = (fst diag) `tappend'` Trace (ac'PairScore a b) [stepBoth a b]
a_gaps <- go (i-1, j)
let a_gap1 = (fst a_gaps) `tappend'` Trace (ac'_gap_opening_penalty + ac'_gap_penalty) [stepLeft a]
let a_gap2 = (fst . snd) a_gaps `tappend'` Trace ac'_gap_penalty [stepLeft a]
let a_gap_max = L.maximumBy (comparing traceScore) [a_gap1, a_gap2]
b_gaps <- go ( i,j-1)
let b_gap1 = (fst b_gaps) `tappend'` Trace (ac'_gap_opening_penalty + ac'_gap_penalty) [stepRight b]
let b_gap2 = (snd . snd) b_gaps `tappend'` Trace ac'_gap_penalty [stepRight b]
let b_gap_max = L.maximumBy (comparing traceScore) [b_gap1, b_gap2]
let max = L.maximumBy (comparing traceScore) [diag_max, a_gap_max, b_gap_max]
return (max,(a_gap_max,b_gap_max))
--
skipInit idx stepFun xs =
let score = ac'_initial_gap_penalty * fromIntegral (idx+1)
tr = reverse [stepFun (xs G.! xi) | xi <- [0..idx]]
in (Trace score tr,(Trace score tr,Trace score tr))
Hi,
I've been trying to add support for an affine gap penalty, but I'm finding it difficult to reason through the code. Essentially, I need to remember
a_gap
andb_gap
for each subsequent step so that I can apply a "gap opening penalty".PS I also noticed that you cite
Chin, Ho, Lam, Wong and Chan (2003)
, but that paper introduces a constrained sequence which doesn't seem to be used in this algorithm — is that right?Thanks!