Open grenaud opened 10 years ago
Don't forget that BAM is not "simply@ binary, but compressed as well. If you're just using cmd line tools, run it through (samtools/tabix) 'bgzip -dc' and pipe into xxd.
Hi Derek ! Thanks for your reply. Yes the message above is decommpressed and piped into xxd.
Ok, just wanted to check the easy stuff first. :)
Looked a bit into this, and it's basically due to the fact that '*' is a special char in SAM, indicating 'no data' for that field.
So for compatibility's sake: when BT is writing BAM records and sees there's a single quality char that's an asterisk, it sets the qualities string to all 1's (BAM way of indicating no data).
I guess I can ignore the SAM-compatible behavior, and write the actual value, if the bases field is length 1 .
That would be great ! I understand when we read SAM, it is impossible to tell whether the quality field is empty but this is via C++ code and we can disentangle both cases. I noticed when another program started complaining that my other program was producing empty fields.
Sorry for the inconvenience. Guess the idea of a 1-base alignment never occurred to me when I wrote that little workaround. :)
yeah we deal with ancient DNA and we can end up with really tiny sequences.
I have a bam record with a single base "A" and a quality score of 9 which is a "*" in the ASCII table (33+9). However, I would expect to see a 42 but xxd -b tells me I have: 00010000 11111111 however, I would expect this: 00010000 00001001
Can this be fixed ?