nilesh-iiita / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

editing reads in place #135

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have a BAM files with mapped reads (from BWA or stampy). I want to trim them, 
i.e. modify the reads in place and write them into a new file. In-place editing 
of reads is undocumented and I run into weird problems setting read.seq and 
read.qual.

What steps will reproduce the problem?
1. Get a read from a BAM file.
2. Trim the sequence: read.seq = read.seq[3:50]
3. Trim the quality string: read.qual = read.qual[3:50]
4. Python complains that read.qual is None!

What is the expected output? What do you see instead?
It should be possible to just reset the read quality as above. Instead, 
read.qual is reset to None upon resetting read.seq.

Note: a workaround is to retrieve the sequence and the qual first, then set 
both together:
seq = read.seq
qual = read.qual
read.seq = seq[3: 50]
read.qual = qual[3: 50]
but this procedure is undocumented and quite unintuitive.

What version of the product are you using? On what operating system?
pysam: 0.7.5
Python: 2.7.5 (Anaconda 1.7.0)
OS: Linux x86_64 kernel 3.10.7

Original issue reported on code.google.com by iosonofa...@gmail.com on 13 Sep 2013 at 12:24

GoogleCodeExporter commented 9 years ago
Thanks for pointing this out - it is indeed not intuitive.
Setting seq erases quality scores as these are managed together
within samtools. Changing the length of a sequence will automatically
change the length of the quality score string. As a safety, the quality
scores are set to 0. I can't see a solution other than to provide
an explicit update method. However, there is a workaround (see below).

The best way to do this is to (as you found out):

q = read.qual
read.seq = read.seq[5:10]
read.qual = q[5:10]

I have added some text to the documentation.

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 17 Sep 2013 at 7:11