itmat / rum

RNA-Seq Unified Mapper
http://cbil.upenn.edu/RUM
MIT License
26 stars 4 forks source link

sam format - TLEN field #147

Closed khayer closed 11 years ago

khayer commented 11 years ago

From http://samtools.sourceforge.net/SAM1.pdf :

TLEN: signed observed Template LENgth. If all segments are mapped to the same reference, the unsigned observed template length equals the number of bases from the leftmost mapped base to the rightmost mapped base. The leftmost segment has a plus sign and the rightmost has a minus sign. The sign of segments in the middle is unde ned. It is set as 0 for single-segment template or when the information is unavailable.

Here is an example (9th column):

seq.1   83  chr17   25038645    25  88M490N12M  =   25038381    853 GCCAGCACACCATCAGCACCTGAATCTTCAGGGTTCACATCACTGTCTAGGAACATCTCCCCAGGGGGATAGTCACTGTCACTGGCCGCAGGAATGCTGG    *   XO:A:F  MD:Z:100    NM:i:   IH:i:1  HI:i:1  XS:A:-
seq.1   163 chr17   25038381    25  94M75N6M    =   25038645    -853    CTGGGACCAGGGTCTGGCACCTCCGTGGCTTCTGTGGCTTCTTCTGTGGATTGGGACGGGTTGACCTTCCCATTGGCAGTGGTCGCCACATCCCCCTGCC    *   XO:A:F  MD:Z:100    NM:i:   IH:i:1  HI:i:1  XS:A:+

should be:

seq.1   83  chr17   25038645    25  88M490N12M  =   25038381    -853    CCAGCATTCCTGCGGCCAGTGACAGTGACTATCCCCCTGGGGAGATGTTCCTAGACAGTGATGTGAACCCTGAAGATTCAGGTGCTGATGGTGTGCTGGC    *   XO:A:F  MD:Z:100    NM:i:0  IH:i:1  HI:i:1  XS:A:-
seq.1   163 chr17   25038381    25  94M75N6M    =   25038645    853 CTGGGACCAGGGTCTGGCACCTCCGTGGCTTCTGTGGCTTCTTCTGTGGATTGGGACGGGTTGACCTTCCCATTGGCAGTGGTCGCCACATCCCCCTGCC    *   XO:A:F  MD:Z:100    NM:i:0  IH:i:1  HI:i:1  XS:A:+

In the TLEN field the leftmost segment has the plus sign and 25038381 is left of 25038645.