lixin6135 / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

str(AlignedRead) does not format RNEXT aka MRNM correctly (SAM column 6) #75

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

1. Download the following BAM file (at any of these URLs in case the branches 
get deleted in future) which is a conversion of ex1.sam shipped with pysam as a 
unit test:

https://github.com/peterjc/biopython/raw/seqio-sam-bam/Tests/SamBam/ex1.bam
https://github.com/peterjc/biopython/raw/seqio-sam-bam-index/Tests/SamBam/ex1.ba
m
https://github.com/peterjc/biopython/raw/SamBam2011/Tests/SamBam/ex1.bam

2. Run this script:

import pysam
for read in pysam.Samfile("ex1.bam", "rb"):
    if "\tNone\t" in str(read):
        print read

What is the expected output? What do you see instead?

Expected output from the script is nothing, actual output shows problem with 
column 6, mate read name (MRNM) aka reference sequence of next segment in 
template (RNEXT), being "None" instead of "*":

EAS56_57:6:190:289:82   69  0   99  0   None    0   99  35  CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA
    <<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<; [('MF', 192)]
...

Ignore the tag problem reported as Issue 74, focus on column 6. Here is the SAM 
version for this read (and its partner) via samtools:

$ samtools view ex1.bam | grep "EAS56_57:6:190:289:82"
EAS56_57:6:190:289:82   69  chr1    100 0   *   =   100 0   CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAA
A   <<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<; MF:i:192
EAS56_57:6:190:289:82   137 chr1    100 73  35M =   100 0   AGGGGTGCAGAGCCGAGTCACGGGGTTGCC
AGCAC   <<<<<<;<<<<<<<<<<;<<;<<<<;8<6;9;;2; MF:i:64 Aq:i:0  NM:i:0  UQ:i:0  H0:i:1  H1
:i:0

Or, using the SAM file as shipped with pysam's unit tests:

$ grep "EAS56_57:6:190:289:82" ex1.sam
EAS56_57:6:190:289:82   69  chr1    100 0   *   =   100 0   CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAA
A   <<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<; MF:i:192
EAS56_57:6:190:289:82   137 chr1    100 73  35M =   100 0   AGGGGTGCAGAGCCGAGTCACGGGGTTGCC
AGCAC   <<<<<<;<<<<<<<<<<;<<;<<<<;8<6;9;;2; MF:i:64 Aq:i:0  NM:i:0  UQ:i:0  H0:i:1  H1
:i:0

Notice here the read has "=" in column 6, meaning same as this fragment (i.e. 
"chr1").

What version of the product are you using? On what operating system?

pysam 0.5, Mac OS X Snow Leopard.

Please provide any additional information below.

See also Issue 25 on how this field is exposed via the object.

Original issue reported on code.google.com by p.j.a.c...@googlemail.com on 13 Oct 2011 at 3:46

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I had by column numbers off by one (6 versus 7), but there are actually several 
bugs here:

Column 3, RNAME. Observe "0" rather than "chr1", see also Issue 25.

Column 4, POS. Observe "99" rather than "100" (BAM coordinates need +1 for SAM)

Column 6, CIGAR. Observe "None" rather than "*"

Column 7, PNEXT. Observe "0" rather than "=" (meaning "chr1" from column 3), 
see also Issue 25.

Column 8, MPOS aka PNEXT. Observe "99" rather than "100" (as in column 4)

Column 9, ISIZE aka TLEN. Observe 35 rather than 0.

Column 12+, Tags wrong reported as Issue 74.

Original comment by p.j.a.c...@googlemail.com on 13 Oct 2011 at 4:12

GoogleCodeExporter commented 8 years ago
str(AlignedRead) is not supposed to print a valid SAM-formatted entry.

Original comment by andreas....@gmail.com on 19 Oct 2011 at 7:53