Open feldroop opened 5 months ago
Hey, sorry for closing without a message. I was just about to write this explanation. The scope of my project shifted a bit and for now, I don't need to use this library anymore. But I'm still interested in its development and happy to discuss the above points.
Hi, @feldroop.
Sorry for not answering sooner, and thank you for your nice words. Your experimental long-read aligner sounds really cool. I am familiar with many of those ideas, and I really love the direction you are taking.
If you are still interested, let's go through your list:
Yes, I should document which ones are safe to be changed (TODO). But surely you can change begin/end_free values of the alignment_span
. However, you cannot switch from score-only
to alignment
nor change the memory-mode
(this requires a new wavefront_aligner_t
object).
Ok, noted. Nobody asked before for BiWFA + ends-free. Conceptually, there is no problem with implementing it. I can put that on TODO (if there is no rush and you are still interested).
So my question is, what exactly are the begin/end_offset values and how do I (conveniently) get the value I want? Is there an easier option than looking at the cigar and doing the offset calculations?
wf_aligner->cigar->begin_offset
are the offsets to the internal buffer that stores the CIGAR operations (i.e., MXID). So, nothing to do with the offset where the glocal alignment starts/ends. In fact, WFA resulting CIGAR is always described from the beginning of the query/pattern
and text
to their end. Glocal mode (aka ends-free or semi-global) just adjusts the algorithm's alignment-model penalties, so leading & trailing deletions/insertions come for free (according to the user's configuration).but it would be nicer to have something like cigar_sprint_SAM_CIGAR_endsfree. I found the cigar_maxtrim_gap_linear function, but it only trims the gaps from the end.
(b) There is no such function implemented at the moment. Sorry. But it should be easy to implement such function (whether on the WFA-lib side or user's code side).
Yes, I can do that. Using edit-distance, a fixed band (implemented) will comply with your restriction. However, with other heuristics, making such optimality guarantees is difficult (i.e., I have to think).
Hey, sorry for closing without a message. I was just about to write this explanation. The scope of my project shifted a bit and for now, I don't need to use this library anymore. But I'm still interested in its development and happy to discuss the above points.
It is ok. No problem at all :-)
Hi!
First of all, thank you a lot for providing this intriguing and well-documented piece of software.
I am working on an experimental longread aligner and currently testing WFA2 as the alignment backend. While doing so, I encountered a few issues.
As recommended, I want to reuse the
wavefront_aligner_t
objects. However, I need to change some of the attributes before each alignment invocation. Which, if any, of the attributes are safe to change on thewavefront_aligner_t
object (i.e. after callingwavefront_aligner_new
)? The most important one for me are the _begin/end_free values of thealignment_span
. Next would be thealignment_scope
, but here I could also store two aligners instead of changing. Finally, I was thinking about varying thememory_mode
depending on the size of the input sequences. From looking at the code ofwavefront_aligner_new
, it seems like the _begin/end_free might be safe to change after construction, the other mentioned attributes not. Is this correct?The longreads in my data sets are up to 600K bases long and I need ends free alignment. I saw in another issue that you recommended the
ultralow
memory mode for input sequences of this size. There is no urgency whatsoever, but it would be nice for me to have theultralow
mode available for ends free alignments.What I want to do is a glocal alignment, like the below example from your README. For my application, I need two pieces of information.
wf_aligner->cigar->begin_offset
to be exactly this, but it is 0 in the example (end_offset
accordingly is 50). So my question is, what exactly are thebegin/end_offset
values and how do I (conveniently) get the value I want? Is there an easier option than looking at the cigar and doing the offset calculations?cigar_sprint_SAM_CIGAR_endsfree
. I found thecigar_maxtrim_gap_linear
function, but it only trims the gaps from the end.max_score
. It would automatically use all heuristics that preserve the optimal score below the configuredmax_score
. Just an idea from lazy me :D