gt1 / biobambam2

Tools for early stage alignment file processing
Other
93 stars 17 forks source link

bamsormadup bin size errors with SAM input from bwa #35

Closed chapmanb closed 7 years ago

chapmanb commented 7 years ago

German; We're running into an issue with using bamsormadup directly on SAM output from bwa. If you run with any htsjdk based tools you'll get errors/warnings about the BAM record:

ERROR: Record 80117, Read name HWI-ST1124:106:C15APACXX:1:1306:21079:31074, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned

The BAMs appear to work okay but generate a ton of messages and potentially will have slower look ups.

This appears to only happen when running from SAM input. This type of input generates the issue:

bwa mem hg19/bwa/hg19.fa 1.fq 2.fq bamsormadup inputformat=sam SO=coordinate > piped.bam

while first feeding to samtools does not:

bwa mem hg19/bwa/hg19.fa 1.fq 2.fq | samtools view -b | bamsormadup inputformat=bam SO=coordinate > intermediate.bam

This is a self-contained test case that demonstrates the issue using picard ValidateSamFile:

wget https://s3.amazonaws.com/chapmanb/testcases/bamsormadup_bin_field.tar.gz

Let me know if I can provide any other details to help debug. Thanks as always for the awesome tools and all the help.

gt1 commented 7 years ago

Hello Brad,

apologies for this. Could you please try version 2.0.62? It should fix this issue for bamsormadup. The bin field was computed wrongly for unmapped reads.

Please note that on the way to get this fixed I noticed that io_lib has similar issues, which means bin fields are still miscomputed for all the other tools (apart from bamsormadup, which does not use io_lib for SAM input) when inputformat=sam is set. I have notified the author of io_lib about this. Meanwhile setting inputformat=maussam (all tools but bamsormadup) may help, this will use libmaus2 SAM parsing instead of io_lib.

Best, German

chapmanb commented 7 years ago

German -- thanks so much for looking at this so quickly. The new version does resolve the issue. I updated the bioconda version we use in bcbio so it fixes the problems we were having there. Thanks also for the heads up on the more general io_lib issue. Thank you again.