tecangenomics / nudup

NuDup -- Marks/removes duplicate molecules based on the molecular tagging technology used in Tecan products.
http://www.tecangenomics.com
GNU Lesser General Public License v3.0
13 stars 9 forks source link

Enhancement request: better error reporting when BAM file contains supplementary alignments #12

Open lparsons opened 6 years ago

lparsons commented 6 years ago

Using 2.3.2 installed via conda with samtools 1.5 I get the following error after deduping:

2017-09-01 16:39:09,002 [     INFO] - Deduplicating NuGEN paired end reads...
2017-09-01 16:39:09,460 [     INFO] - Using molecular tag sequence from Index FASTQ read
2017-09-01 16:39:09,461 [     INFO] - Appending molecular tag sequence to SAM/BAM read name
2017-09-01 16:42:01,169 [     INFO] - Processing sorted SAM/BAM with molecular tag sequence in read name (assumes sorted)
samtools view: writing to standard output failed: Broken pipe
samtools view: error closing standard output: -1
2017-09-01 16:42:01,329 [    ERROR] -
lparsons commented 6 years ago

It seems that this error was due to "supplementary" alignments in the BAM file. An exception was thrown (nudup.py line 646) but was not properly reported to the user. I'm updating this ticket to reflect the desire for clearer error reporting.

peterwc commented 6 years ago

Hello Iparsons,

I have the exact same error using BWA for my alignments. What did you do to "solve" the issue for now?

shuelga commented 6 years ago

Hi @peterwc,

You can use BWA with -c 1 to prevent the aligner from reporting multiple alignments. You can also filter them out with samtools before feeding the SAM/BAM into NuDup.