waveygang / wfmash

base-accurate DNA sequence alignments using WFA and mashmap3
MIT License
174 stars 18 forks source link

scoring parameter updates for broader diversity and alignment mode optimizations #247

Closed ekg closed 3 months ago

ekg commented 4 months ago

The gist of this is that these are the easiest scoring parameters that actually utilize the affine gap features of WFA, which is used inside of wflign, of wflign itself, and of the biWFA-based patching. The parameters used for the patching are borrowed from minimap2's defaults, and also appear to work well.

These are:

    --wfa-params=[mismatch,gap1,ext1] score parameters for the wfa alignment (affine); match
                                      score is fixed at 0 [default: 2,3,1]
    --wfa-patching-params=[mismatch,gap1,ext1,gap2,ext2]
                                      score parameters for the wfa patching alignment (convex);
                                      match score is fixed at 0 [default: 3,4,2,24,1]
    --wflign-params=[mismatch,gap1,ext1]
                                      score parameters for the wflign alignment (affine); match
                                      score is fixed at 0 [default: 2,3,1]
ekg commented 4 months ago

The problem that's being resolved here is that we were unable to align things at divergence of 20-30% and the alignments we were getting would often be extremely fragmentary, only including the very high identity matches and missing a lot of the more rapidly evolving intronic sequences. With this adjustment we at least align everywhere, however more testing is probably required to see if that is tending to be correct, or if we're forcing alignments through regions that don't necessarily support them.