smarco / WFA2-lib

WFA-lib: Wavefront alignment algorithm library v2
Other
162 stars 36 forks source link

how to use extension #18

Open brentp opened 2 years ago

brentp commented 2 years ago

Hi, I'm using the code below to evaluate using wfa-lib to extend given a seed. Based on the docs, I though it would have to extend left and then right in 2 different calls, but it seems that's not the case.

In brief, given:


   //              0123456789012345678901234567890
   char *p =      "AAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAA";
   //         0123456788901234
   char *t = "AAAAAAAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";

             int pattern_begin_free = 7;
             int pattern_end_free = 0;
             int text_begin_free = 11;
             int text_end_free =0;

I get the full alignment:


      PATTERN    -----AAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAA---------------
                      |||||||||||||||||||||||||||||||||               
      TEXT       AAAAAAAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

So, for seed and extend, is it only needed to do something like this and it will indeed extend right as well?

Also, is there an efficient way for this to get the start, end of the alignment in the pattern and text? I don't need the full cigar or backtace, only the extents of the alignment.

thanks in advance.

here is the code I use for testing.

#include "wavefront/wavefront_align.h"

int wf_align(char *pattern, char *text, distance_metric_t distance_metric,
             int match, /* 0 or less */
             int mismatch, /* 1 or more */ 
             int gap_open, /* 1 or more */
             int gap_extend, /* 1 or more */
             int pattern_begin_free,
             int pattern_end_free,
             int text_begin_free,
             int text_end_free
   ) {

    wavefront_aligner_attr_t attr = wavefront_aligner_attr_default;
    attr.distance_metric = distance_metric;
    attr.distance_metric = gap_affine;
    attr.affine_penalties.match = match;
    attr.affine_penalties.mismatch = mismatch;
    attr.affine_penalties.gap_opening = gap_open;
    attr.affine_penalties.gap_extension = gap_extend;
    attr.alignment_scope = compute_alignment;

    attr.heuristic.strategy = wf_heuristic_wfadaptive;
    attr.heuristic.min_wavefront_length = 10;
    attr.heuristic.max_distance_threshold = 50;
    attr.heuristic.steps_between_cutoffs = 1;

    attr.alignment_form.span = alignment_endsfree;
    // right extension
    attr.alignment_form.pattern_begin_free = pattern_begin_free;
    attr.alignment_form.pattern_end_free = pattern_end_free;
    attr.alignment_form.text_begin_free = text_begin_free;
    attr.alignment_form.text_end_free = text_end_free;

    // left extension
    /*
    attr.alignment_form.pattern_begin_free = pattern_begin_free;
    attr.alignment_form.pattern_end_free = 0;
    attr.alignment_form.text_begin_free = text_begin_free;
    attr.alignment_form.text_end_free = 0;
    */

    wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attr);

    wavefront_align(wf_aligner, pattern, strlen(pattern), text, strlen(text));
    cigar_print_pretty(stderr,
      pattern,strlen(pattern),text,strlen(text),
      &wf_aligner->cigar,wf_aligner->mm_allocator);

    wavefront_aligner_delete(wf_aligner);
}

int main() {
   //              0123456789012345678901234567890
   char *p =      "AAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAA";
   //         0123456788901234
   char *t = "AAAAAAAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";

   wf_align(p, t, gap_affine, 0, 2, 2, 1, 7, 0, 11, 0);

}
smarco commented 2 years ago

Hi,

Hi, I'm using the code below to evaluate using wfa-lib to extend given a seed. Based on the docs, I though it would have to extend left and then right in 2 different calls, but it seems that's not the case.

No, I'm afraid it is not the same. If you specify the following:

             int pattern_begin_free = 7;
             int pattern_end_free = 0;
             int text_begin_free = 11;
             int text_end_free =0;

You are telling the WFA that it can start the pattern/query alignment at pos=7 (previous deletions are for free) or the text at pos=11 (previous insertions are for free). Note that, in ends-free mode, it cannot do both (it is not local alignment).

To do this properly, I'm afraid, you will have to run 2 alignment extensions: left and right from the seed ends. We had another request to offer this functionality in the API. We could probably offer this out-of-the-box (but don't wait for me).

Also, is there an efficient way for this to get the start, end of the alignment in the pattern and text? I don't need the full cigar or backtace, only the extents of the alignment.

Ah. You mean, score-only alignment-extension, but returning the position where it stopped. Interesting idea. You can always access the position where the alignment stopped the extension HERE. I can provide a friendly API to retrieve this info (if useful).

Let me know.

Cheers

brentp commented 2 years ago

Thanks for the reply. Re score-only positions, that's great! I can access the struct directly.

For extension, an API for that would be great. Meanwhile, I don't understand how to specify the values for both calls. When I set pattern_end_free and text_end_free values with the others set to 0, then I get unexpected alignment, but as you see from my example above, when I set pattern_begin_free and text_begin_free values it gives a weird alignment.

An example would be great, or if the readme section for this used real values that matched the text instead of e.g. pattern_end_free variable, then I could try to decipher.

Dimi99 commented 2 years ago

Hi brentp, I was also trying to figure out a way to use the wfa-lib to extend given a seed. I wanted to ask you If you figured out a solution, because I am also getting unexpected alignments while trying to extend my seeds.

Best Regards Dimi

h-2 commented 2 years ago

I am also not certain how extension works. The documentation says:

    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_endsfree;
    attributes.alignment_form.pattern_begin_free = pattern_begin_free;
    attributes.alignment_form.pattern_end_free = pattern_end_free;
    attributes.alignment_form.text_begin_free = text_begin_free;
    attributes.alignment_form.text_end_free = text_end_free;

But "text_end_free" is not defined. What do I set it to, if I want to do a right-extension? I initially thought that these were boolean values and just set pattern_end_free = 1 and text_end_free = 1 but that's apparently not correct.

smarco commented 2 years ago

Ok, let us see if I can clarify this (I will update the README). All these values are positive integers or zero (to disable), and each one regulates how much begin/end can be free of penalty in the alignment. See the following figure.

image

Does this help? Can it be better understood now?

h-2 commented 2 years ago

Ah, OK, so the value that one needs to set actually depends on the length of the sequences?

I have previously only worked with NW and SW algorithms, so that's why I am confused, maybe. There I can have a right extension, where the extension continues computing to the end, but keeps track of the maximum score (best) in the matrix, and then it backtracks from there to the beginning. As a result, the alignment itself can contain gaps (that are not free), but the gaps "behind" the (clipped) alignment are free.

It looks like this kind of "right-extension" is not possible here? Because gaps are always scored until a fixed position X and then never afterwards?

smarco commented 2 years ago

Ah, OK, so the value that one needs to set actually depends on the length of the sequences?

Right! You can increment or decrement these values as you deem appropriate.

I have previously only worked with NW and SW algorithms, so that's why I am confused, maybe.

NW is usually presented as global (end-to-end) or semi-global (ends-free). WFA implements these two modes.

SW is usually presented as local alignment, WFA cannot do local (not that I know).

There I can have a right extension, where the extension continues computing to the end, but keeps track of the maximum score (best) in the matrix, and then it backtracks from there to the beginning. As a result, the alignment itself can contain gaps (that are not free), but the gaps "behind" the (clipped) alignment are free. It looks like this kind of "right-extension" is not possible here? Because gaps are always scored until a fixed position X and then never afterwards?

From your description, it seems to me that you are looking for true local alignment; that cannot be done with WFA. What WFA can do is extend until the score drops below a certain threshold and return the alignment until that point back to the beginning allowing a trim/deletion at the beginning of the query or text for free.

EDIT: It just occurred to me that, probably, if you were using another library (or tool) to achieve this behaviour, you could let me know the details so I can narrow down the behaviour you are looking for. That would help.