zhaoming159753 / bedtools

Automatically exported from code.google.com/p/bedtools
0 stars 0 forks source link

bamToBed is not handling paired-end bam reads #139

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Download the attached queryname-sorted bam file
2. bamToBed -i test.bam -bedpe > out.bed
*****ERROR: -bedpe requires BAM to be sorted/grouped by query name.
3. samtools view -h test.bam | less
(note that the mate reads DO in fact occur right next to each other in the BAM 
file, contrary to what bedtools is saying).

The problem may be the new Illumina fastq format is causing the bam read names 
to be different than what bedtools is used to seeing?

Original issue reported on code.google.com by cooke...@gmail.com on 12 Sep 2012 at 12:33

Attachments:

GoogleCodeExporter commented 8 years ago
See here two examples of names of paired reads from two different bam files, 
the first from the old-style Illumina fastq, and the second from the new style. 
I think what your program is doing is looking for paired-end reads by checking 
if the read names are the same, but this will not work for the new Illumina 
format because read 1 is denoted by "_1" and read 2 by "_2". Please update your 
bam parser. Thanks!

HWI-ST287:8:1101:1161:127785#CGATGT
HWI-ST287:8:1101:1161:127785#CGATGT

DG7PMJN1:308:C0YP8ACXX:1:1101:1201:163956_1:N:0:ACGTAC
DG7PMJN1:308:C0YP8ACXX:1:1101:1201:163956_2:N:0:ACGTAC

Original comment by cooke...@gmail.com on 12 Sep 2012 at 5:09

GoogleCodeExporter commented 8 years ago
Similarly, bamtobed -bedpe does not properly recognize PE solid reads with the 
names exampled below

853_26_330_F3/1
853_26_330_F5-BC/2

Original comment by jerpars...@gmail.com on 17 Sep 2012 at 2:06

GoogleCodeExporter commented 8 years ago
Thanks for reporting this.  I have not run into this before and will have to 
think about how to handle this more generally.

Original comment by aaronqui...@gmail.com on 17 Sep 2012 at 9:00