Closed y9c closed 1 year ago
I ran the above command and reproduced what you got:
@HD VN:1.6 SO:unsorted
@SQ SN:ref LN:89
@PG ID:strobealign PN:strobealign VN:v0.9.0-63-g77e36e1 CL:./build/strobealign -k 8 -A 4 -O 3 -L 100 /Users/ksahlin/tmp/strobealign_bugfixes/ref.fa /Users/ksahlin/tmp/strobealign_bugfixes/query.fa
Running in single-end mode
qry 0 ref 31 0 3M1D13M1D5M * 0 0 CTAGGTATTGGTGGTGGTCTT NM:i:17 AS:i:239
So this is a bug from what I can see.
Besides the bug of 17 -> 18
: I must say I am a bit confused by strobealign's chosen alignment since the ref and query both contain the k-mer GGTATTGGTGGTG
which I assume would be matched in the best alignment, even with the customised alignment parameters. This shared 13nt segment is what finds a shared seed in the first place. If I only run with the parameters -k 8 -A 4 -L 100
(omitting -O
) we get 3X13=5X
which looks less degenerate. I cannot understand at the moment how lowering gap opening from 12 to 3 would cause this. In fact, I tried all -O
from 1 to 12 and degenerate alignment happen with -O
from 1 to 3, while an -O
set between 4 to 12 finds the shared 13-mer.
@marcelm, care to look deeper into this?
I can also reproduce the problem. It appears that the CIGAR string is incorrect: When running strobealign with --eqx
, the CIGAR string that is output is 1X1=1X1D1X1=3X1=1X1=2X1=2X1D4X1=
. The last operation is a match (=
), but that is not correct since A
is aligned with T
, so the operation should be an X
. The value of the NM
tag is derived from the CIGAR, which explains the discrepancy.
I’ve checked which seed is used, and that is indeed GGTATTGGTGG
, so the alignment seems to be computed incorrectly.
This turns out to be a bug in SSW that has been fixed already upstream, but we were still using the old version in strobealign, see #296. With the fix, the following alignment is found:
GTGATGGTATTGGTGGTGAGTGAT ref
X=D=D=============D==XX=
CT A-GGTATTGGTGGTG-GTCTT query
CIGAR string is 1X1=1D1=1D13=1D2=2X1=
and NM is 6.
Thanks a lot for reporting!
Thank you very much for the quick fix.
I noticed that strobealign might not calculate NM tags correctly for some special cases. I generated an example as follow:
The reference file:
The query sequence:
And tested it use this command:
[bam_fillmd1] different NM for read 'qry': 17 -> 18
The error message from samtools indicates that the
NM
tag is not correct, at least not in a stadard way.I count the mismatch manually, and the number is 18 rather than 17
|
for match and.
for mismatch