mozack / abra2

ABRA2
MIT License
92 stars 9 forks source link

Realigned output bam TLEN field plus/minus sign when FLAG == 147 #40

Open Julie-Zhongyun-Huang opened 4 years ago

Julie-Zhongyun-Huang commented 4 years ago

Hi there! We are recently very interested in abra2 for fast and accurate reassembly/realignment of InDels. When using other tools with the realinged bam from abra2, we discovered this following potential issue. Please see the following example read pair:

A00337:46:HHGVNDMXX:1:1441:31946:25316:CTGCAGTA:CTGCAGTA:GA:AA  147     chr16   3727646 60      139M    =       3727648 139     TTCCTAGATGCCTGGATTTTCAGTACAAAAGGTCCAAGAACATGAAAGGGGAAAGGTGATGCTCTCACAATGCTACAAGCCCTCCACAAACTTCTCTAGCGTGTCCCCCGTGGTGTCCCCGACCAGGGACAGTTCGCTG     :FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF     YA:Z:chr16:3727129:964M MD:Z:48A88      RG:Z:4  NM:i:3  YM:i:2  YO:Z:chr16:3727648:-:2S137M     AS:i:132        XS:i:23 YX:i:3
A00337:46:HHGVNDMXX:1:1441:31946:25316:CTGCAGTA:CTGCAGTA:GA:AA  99      chr16   3727648 60      8S130M  =       3727646 -139    TTTTTATTC
CTAGATGCCTGGATTTTCAGTACAAAAGGTCCAAGAACATGAAAGGGGAAAGGTGATGCTCTCACAATGCTACAAGCCCTCCACAAACTTCTCTAGCGTGTCCCCCGTGGTGTCCCCGACCAGGGACAG      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      YA:Z:chr16:3727129:964M MD:Z:48A81      RG:Z:4  NM:i:1  AS:i:125        XS:i:23

In this example, for the FLAG == 147 read, the POS column (col4, here 3727646) is less than PNEXT (col 8, here 3727648), and the TLEN (col 9, here 139) receives a plus sign.

However, when I check other bam files not realigned/reassembled, in such situation (FLAG == 147 & POS < PNEXT), TLEN is always with minus sign.

According to SAM format specification , for TLEN, the leftmost segment has a plus sign and the rightmost has a minus sign. For FLAG==147 (second of a pair / reverse-complemented), when POS < PNEXT, the segment should still be the rightmost.

Please don't hesitate to let me know if the TLEN sign should be modified. Thanks a lot!!

Julie

mozack commented 4 years ago

I'm not sure I have a grasp on the issue here.

Based on a quick reading of the SAM spec, I could not anything to support the following statement:

"For FLAG==147 (second of a pair / reverse-complemented), when POS < PNEXT, the segment should still be the rightmost."

Feel free to correct me if I am missing something and point me to where this is defined.

mozack commented 4 years ago

Also, it may be helpful to hear how this is impacting downstream tools your are using with the realigned BAM. Thanks.