robinp / align

Sequence alignment in Haskell, not only for strings (local, global, multi-sequence).
Other
5 stars 3 forks source link

affine gap penalty #3

Open chchch opened 3 years ago

chchch commented 3 years ago

Hi,

I've been trying to add support for an affine gap penalty, but I'm finding it difficult to reason through the code. Essentially, I need to remember a_gap and b_gap for each subsequent step so that I can apply a "gap opening penalty".

PS I also noticed that you cite Chin, Ho, Lam, Wong and Chan (2003), but that paper introduces a constrained sequence which doesn't seem to be used in this algorithm — is that right?

Thanks!

robinp commented 3 years ago

Hello - just had a quick look on https://web.stanford.edu/class/cs262/presentations/lecture2.pdf about affine gaps. It suggests that one needs to account the best score for a given (i,j) pair separately for the cases if the gap is already open or not.

I can imagine that would mean adding a third, binary dimension to the coordinates, like (i, j, Open|NotOpen), and then when moving horizontally/verticall, you move from (i, j, NotOpen) to either (i-1, j, Open). When moving diagonally, you move from (i, j, _) to (i-1, j-1, NotOpen).

Just a hunch. Not sure about the reference, please excuse me. Also, the code indeed could have more type signatures / guidance.

chchch commented 3 years ago

Hi,

It seems that you actually need to save three scores at each step, rather than just the max score. Here's the algorithm from Gusfield 1997 Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, 244: image

This is my (pretty ugly) attempt at implementing it... I'm not so good with monads and do-notation:

Trace a b `tappend'` Trace  y (z:_) = Trace (a+y) (z:b)

data AlignConfig' a s = AlignConfig'
  { ac'PairScore :: a -> a -> s
  , ac'_initial_gap_penalty :: s
  , ac'_gap_penalty :: s
  , ac'_gap_opening_penalty :: s
  }

alignConfig' :: (a -> a -> s)  -- ^ Scoring function.
            -> s               -- ^ Initial gap score.
            -> s               -- ^ Gap score.
            -> s               -- ^ Gap opening score.
            -> AlignConfig' a s
alignConfig' = AlignConfig'

affineAlign :: (G.Vector v a, Num s, Ord s)
  => AlignConfig' a s
  -> v a  -- ^ Left sequence.
  -> v a  -- ^ Right sequence.
  -> Trace a s
affineAlign AlignConfig'{..} as bs =
  let p = (lastIndex as, lastIndex bs)
  in revTrace . fst $ evalState (go p) M.empty
  where
  revTrace (Trace s t) = Trace s (reverse t)
  lastIndex v = G.length v - 1
  --
  go p = do
    res <- gets $ M.lookup p
    case res of
        Just r -> return r
        Nothing -> do
            newRes <- pgo p
            modify (M.insert p newRes)
            return newRes
  --
  pgo (i,j)
    | i == (-1) || j == (-1) = return $
      if i == j then (Trace 0 [],(Trace 0 [],Trace 0 []))
      else if i == (-1)
           then skipInit j stepRight bs
           else skipInit i stepLeft as
    | otherwise = do
      let a = as G.! i
          b = bs G.! j
      diag  <- go (i-1,j-1)
      let diag_max = (fst diag) `tappend'` Trace (ac'PairScore a b) [stepBoth a b]

      a_gaps <- go (i-1,  j)
      let a_gap1 = (fst a_gaps) `tappend'` Trace (ac'_gap_opening_penalty + ac'_gap_penalty) [stepLeft a]
      let a_gap2 = (fst . snd) a_gaps `tappend'` Trace ac'_gap_penalty [stepLeft a]
      let a_gap_max = L.maximumBy (comparing traceScore) [a_gap1, a_gap2]

      b_gaps <- go (  i,j-1)
      let b_gap1 = (fst b_gaps) `tappend'` Trace (ac'_gap_opening_penalty + ac'_gap_penalty) [stepRight b]
      let b_gap2 = (snd . snd) b_gaps `tappend'` Trace ac'_gap_penalty [stepRight b]
      let b_gap_max = L.maximumBy (comparing traceScore) [b_gap1, b_gap2]

      let max = L.maximumBy (comparing traceScore) [diag_max, a_gap_max, b_gap_max]
      return (max,(a_gap_max,b_gap_max))
  --
  skipInit idx stepFun xs =
    let score = ac'_initial_gap_penalty * fromIntegral (idx+1)
        tr = reverse [stepFun (xs G.! xi) | xi <- [0..idx]]
    in (Trace score tr,(Trace score tr,Trace score tr))