Closed marcelm closed 1 year ago
Hi,
This I can answer right away. The WFA2lib follows the convention that describes how to transform the Pattern/Query into the Text/Database/Reference (as in classic pattern matching papers). However, the SAM CIGAR standard works the other way around (as the Reference is the important sequence). Beyond the discussion of which one is better (I think they are both ok), if you want CIGAR-style alignments, just swap pattern <-> text sequences when calling the WFA align function, and you will get all the Ds converted into Is (and vice-versa).
Let me know if that helps.
Thanks! I see. Would you consider adding a comment to the README to make this clear for others as well?
Swapping pattern and text is of course the simplest fix for this, and it is what I’m using at the moment.
Sure (sorry for the delay). Please, have a look into development
and let me know if that feels more clear.
Thanks,
Thanks, that is clear enough!
Running
wfademo.cpp
, I noticed that the meaning of D and I in the CIGAR output seems to have been swapped from their usual meaning. Here’s an example taken from the README:The README states that text is equivalent to reference and pattern equivalent to query (which makes sense). If I take the above pattern to be a sequencing read and the text to be a genome reference, then the two gaps would be considered to be deletions, but they are encoded as
1I
and3I
, respectively. Or should I think about this differently?