pezmaster31 / bamtools

C++ API & command-line toolkit for working with BAM data
MIT License
418 stars 153 forks source link

Conversion to SAM reports inaccurate tags #206

Closed morispi closed 3 years ago

morispi commented 3 years ago

Hello,

I'm encountering this issue using bamtools that the tags sometimes appear to be reported inaccurately. Please see this example:

Using samtools: SRR10584146.41433 81 NC_000913.3 93 60 146M = 4641575 4641338 GAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAG FFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6 NM:i:0 MD:Z:146 MC:Z:78M68S AS:i:146 XS:i:0 BX:Z:TTTCGTGCCTCTACCCTT

Using bamtools: SRR10584146.41433 81 NC_000913.3 93 60 146M = 4641575 4641338 GAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAG FFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6 NM:i:0 MD:Z:146 MC:Z:78M68S AS:i:65426 XS:i:0 BX:Z:TTTCGTGCCTCTACCCTT

Most of the line seems okay, but the "AS:i" tag appears to be wrong, and I'm not exactly sure why. I tried to take a look at the source code, and managed to find that the error seems to come from the following lines of the src/toolkit/bamtools_convert.cpp file:

case (Constants::BAM_TAG_TYPE_UINT8):
                // force value into integer-type (instead of char value)
                m_out << "i:" << static_cast<uint16_t>(tagData[index]);
                ++index;
                break;

I tried modifying the casts to see if I can manage to fix the issue, but it didn't manage to come up with something that actually works. The tag reporting 65426 instead of 146 however kind of makes me think about some sort of overflow issue?

Would you maybe have an idea as to why I'm encountering such a behaviour, and what I should do to fix it?

Thanks, Pierre

SoapZA commented 3 years ago

@morispi I've fixed the bug, but please verify with HEAD from git that it's truly fixed. I'll cut a new release soon

morispi commented 3 years ago

@SoapZA Thanks for the update! Unfortunately this does not seem to fix it for me. I'm still getting the same result I mentioned in the issue.

Other alignments work fine though, for instance:

SRR10584146.41433 2209 NC_000913.3 1 60 78H68M = 93 238 AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCA FFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFA=FF=F6F=FFFFFFFFFFFF/ NM:i:0 MD:Z:68 MC:Z:146M AS:i:68 XS:i:0 SA:Z:NC_000913.3,4641575,+,78M68S,60,0; BX:Z:TTTCGTGCCTCTACCCTT

gets reported correctly both by bamtools and samtools. However, it was probably already the case with the previous version.

SoapZA commented 3 years ago

could you upload the BAM file you're testing on?

morispi commented 3 years ago

Oh, my bad, it seems like it actually works, and I didn't integrate it to my project correctly. Sorry for the inconvenience, I wanted to quickly test out and give you feedback, but it seems like I was - too - quick.

Thanks for the fix! Pierre