rogerjms / bedtools

Automatically exported from code.google.com/p/bedtools
0 stars 0 forks source link

coverageBed: all exons with null read coverage #112

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I run coverageBed on a .gff file (Ensembl h. sapiens) together with a .bam file 
generated with TopHat, and I got null coverage on all exons.

A reduced version of the two input files is attached.

What steps will reproduce the problem?
1. coverageBed -counts -abam chr10align.bam -b Ensembl_chr10 > 
exons.bed.coverage

What is the expected output? What do you see instead?
$ head exons.bed.coverage 
chr10   EnsEMBL exon    54525140        54528270        .       -       .       
gene_id "ENSG00000165471"; transcript_id "ENST00000373968"; exon_id 
"ENSE00001462026"   0
chr10   EnsEMBL exon    89127091        89130452        .       +       .       
gene_id "ENSG00000214562"; transcript_id "ENST00000412718"; exon_id 
"ENSE00001623832"   0
chr10   EnsEMBL exon    102759233       102762703       .       +       .       
gene_id "ENSG00000107816"; transcript_id "ENST00000370220"; exon_id 
"ENSE00001452101"   0
chr10   EnsEMBL exon    116391661       116392100       .       -       .       
gene_id "ENSG00000099204"; transcript_id "ENST00000369260"; exon_id 
"ENSE00002272601"   0
chr10   EnsEMBL exon    7601232 7605347 .       -       .       gene_id 
"ENSG00000123243"; transcript_id "ENST00000256861"; exon_id "ENSE00001482074"   
0
chr10   EnsEMBL exon    15858834        15860007        .       -       .       
gene_id "ENSG00000148481"; transcript_id "ENST00000378036"; exon_id 
"ENSE00001475978"   0
chr10   EnsEMBL exon    17432530        17432619        .       -       .       
gene_id "ENSG00000148488"; transcript_id "ENST00000377602"; exon_id 
"ENSE00001763915"   0
chr10   EnsEMBL exon    17432530        17432619        .       -       .       
gene_id "ENSG00000148488"; transcript_id "ENST00000377610"; exon_id 
"ENSE00001774857"   0
chr10   EnsEMBL exon    18084907        18089855        .       +       .       
gene_id "ENSG00000184040"; transcript_id "ENST00000480516"; exon_id 
"ENSE00001856822"   0
chr10   EnsEMBL exon    46923604        46923810        .       +       .       
gene_id "ENSG00000165874"; transcript_id "ENST00000301038"; exon_id 
"ENSE00002475566"   0

What version of the product are you using? On what operating system?
bedtools 2.15.0

Thank you in advance.

Francesca

Original issue reported on code.google.com by francesc...@gmail.com on 22 Feb 2012 at 8:41

Attachments:

GoogleCodeExporter commented 9 years ago
Strange.  This works for me:

coverageBed -counts -abam ~/Downloads/chr10align.bam -b 
~/Downloads/Ensembl_chr10_red | awk '$15 > 0'
chr10   EnsEMBL exon    116391661   116392100   .   -   .   gene_id "ENSG00000099204"; 
transcript_id "ENST00000369260"; exon_id "ENSE00002272601"  11
chr10   EnsEMBL exon    7601232 7605347 .   -   .   gene_id "ENSG00000123243"; 
transcript_id "ENST00000256861"; exon_id "ENSE00001482074"  55
chr10   EnsEMBL exon    50723151    50725167    .   -   .   gene_id "ENSG00000243251"; 
transcript_id "ENST00000374127"; exon_id "ENSE00002297141"  1
chr10   EnsEMBL exon    50723379    50725160    .   -   .   gene_id "ENSG00000243251"; 
transcript_id "ENST00000508005"; exon_id "ENSE00002211078"  1
chr10   EnsEMBL exon    50723247    50725167    .   -   .   gene_id "ENSG00000258838"; 
transcript_id "ENST00000447839"; exon_id "ENSE00002025095"  1
chr10   EnsEMBL exon    50723247    50725167    .   -   .   gene_id "ENSG00000258838"; 
transcript_id "ENST00000515869"; exon_id "ENSE00002025095"  1
chr10   EnsEMBL exon    65928720    65930611    .   -   .   gene_id "ENSG00000235489"; 
transcript_id "ENST00000412976"; exon_id "ENSE00001699624"  1
chr10   EnsEMBL exon    76414714    76415889    .   +   .   gene_id "ENSG00000214626"; 
transcript_id "ENST00000398689"; exon_id "ENSE00001534352"  2
chr10   EnsEMBL exon    76936146    76941881    .   +   .   gene_id "ENSG00000156671"; 
transcript_id "ENST00000542569"; exon_id "ENSE00002206063"  161
chr10   EnsEMBL exon    91094774    91095291    .   -   .   gene_id "ENSG00000107798"; 
transcript_id "ENST00000371829"; exon_id "ENSE00001411773"  10

Original comment by aaronqui...@gmail.com on 22 Feb 2012 at 10:48

GoogleCodeExporter commented 9 years ago
Hi, I'm the manager of the system Francesca is using. This is a IBM POWER7 
machine: BIG endian. Me and my colleague made some debugging and found the 
culprit in src/utils/BamTools/src/api/BamAux.h  The routine involved is 
UnpackUnsignedShort: given the 2 bytes read from bam header, it puts them in 
fixed order (LITTLE endian) inside a (unsigned)short int. In this architecture 
this is wrong, since it's a BIG endian machine. The same problem, we think, may 
arise in all the other conversion routines.

Original comment by paoloemi...@gmail.com on 23 Feb 2012 at 3:41

GoogleCodeExporter commented 9 years ago
Hi all, I'm the developer of BamTools.

Do you know which call to UnpackSignedShort causes the error? Most of the 
'unpack' methods are wrapped by endian-swapping calls where necessary, but I 
may have missed some.

I don't see any 2-byte reads from the BAM header - do you mean those in the 
BgzfStream::CheckBlockHeader() method? Looks like there are two such reads 
there that are not swapped and then directly compared to an integer constant, 
so that's likely the problem. Just want to double-check that this is where the 
debugging you guys did pointed to.

Thanks. 
  - Derek

Original comment by DerekWBa...@gmail.com on 28 Feb 2012 at 3:12

GoogleCodeExporter commented 9 years ago
Hi Derek,

the calls to UnpackUnsignedShort are those in 

src/utils/BamTools/src/api/internal/io/BgzfStream_p.cpp 

lines 54 and 57. I think these are the ones you are referring to.
Thanks for your help.

Paolo

Original comment by paoloemi...@gmail.com on 29 Feb 2012 at 4:31