hall-lab / svtyper

Bayesian genotyper for structural variants
MIT License
125 stars 55 forks source link

ValueError: mate not found #7

Closed MinocheAE closed 8 years ago

MinocheAE commented 9 years ago

Hi, sounds like a interesting tool. Is there a detailed description what it actually does? Does it compare the coverage within and outside of the event? How many flanking bases are considered?

This is the error message I get: File "pysam/calignmentfile.pyx", line 836, in pysam.calignmentfile.AlignmentFile.mate (pysam/calignmentfile.c:10945) ValueError: mate not found

I used BWA MEM for the alignment, called multiple samples with lumpy-sv. From looking at the svtyper script, it seems that it does support multiple samples, if bams are provided in a comma seperated list. If I split the multivcf per sample and run svtyper in the single sample mode or if svtyper in the multisample mode the error message stays the same.

Do you have any idea what went wrong?

Some lines actually got processed. The last output line is: 1 823747 12 N 733.59 . SVTYPE=DEL;SVLEN=-206;END=823953;STRANDS=+-:2;IMPRECISE;CIPOS=-7,8;CIEND=-1,9;CIPOS95=-2,4;CIEND95=0,9;SU=2;PE=0;SR=2 GT:SU:PE:SR:GQ:SQ:GL:DP:RO:AO 0/1:0:0:0:200:733.59:-83,-9,-106:333:252:81 ./.:0:0:0:.:.:.:.:.:. ./.:0:0:0:.:.:.:.:.:. ./.:0:0:0:.:.:.:.:.:. ./.:2:0:2:.:.:.:.:.:.

Strange that only sample got genotyped. The corresponding lines in the input file: 1 823747 12 N . . SVTYPE=DEL;STRANDS=+-:2;SVLEN=-206;END=823953;CIPOS=-7,8;CIEND=-1,9;CIPOS95=-2,4;CIEND95=0,9;IMPRECISE;SU=2;PE=0;SR=2 GT:SU:PE:SR ./.:0:0:0 ./.:0:0:0 ./.:0:0:0 ./.:0:0:0 ./.:2:0:2 1 829170 13 N . . SVTYPE=DEL;STRANDS=+-:9;SVLEN=-35;END=829205;CIPOS=-9,8;CIEND=-5,3;CIPOS95=0,0;CIEND95=0,0;SU=9;PE=1;SR=8 GT:SU:PE:SR ./.:4:0:4 ./.:2:0:2 ./.:1:1:0 ./.:2:0:2 ./.:0:0:0

Thanks

cc2qe commented 9 years ago

SVTyper compares the number of non-reference to reference reads at each breakpoint. A detailed description is in the Methods section of the SpeedSeq pre-print: http://biorxiv.org/content/early/2014/12/05/012179

The flank distance is a command line variable, but it's 20 by default.

The error you're seeing is because SVTyper cannot find the mate of a paired-end read. Is is possible that some mates are missing from your BAM file? Or are there single-end reads in it?

In general, SVTyper will be much faster if mate tags (MC, MQ) are present in your BAM, since without them it must seek to the corresponding mate position in the BAM. You may want to consider running SAMBLASTER with --addMateTags on your BAMs.

I also just pushed some of the recent dev branch changes to the master branch, so you might try downloading the updated version. But I doubt that it will affect the error you're seeing.

iranmdl commented 9 years ago

I'm having the same problem as MinocheAE. Same error. I've checked the bam file and the mates are not missing, it's weird. If you have any news about this issue I would be most grateful.

cc2qe commented 9 years ago

It sounds like this may be a bug in pysam. What version of Pysam are you using? All SVTyper is doing is calling bam.mate(read) from the Pysam library. Also, can you post your full error message? (Minoche's was missing the svtyper line call)

MinocheAE commented 9 years ago

My bam files don't have the MC and MQ tags, so I gave speedseq a try, it uses the samblaster to add these tags to the bam file. This worked.

cc2qe commented 9 years ago

We are investigating these issues, but it appears to be related to BWA MEM alignment without the "-M" flag, and the way that Samblaster and SVTyper handle them

zeeev commented 9 years ago

Colby, would this modification cause problems?

for read in sample.bam.fetch(chrom, max(pos - (fetch_flank), 0), pos):
        lib = sample.get_lib(read.opt('RG')) # get the read's library
        if (read.is_reverse
    or not read.mate_is_reverse
            or read.is_secondary
     -->  or read.flag & 0x800
            or read.is_unmapped
            or read.mate_is_unmapped
            or read.is_duplicate
            or read.pos + discflank > pos
            or read.pnext + lib.read_length - discflank < pos
            or read.tid != read.rnext):
            continue
cc2qe commented 9 years ago

Nope that modification should work just fine for now. Since we are transitioning to supplementary (rather than secondary alignments) I'm soon going to default SVTyper to assume alignments did not use the "-M" flag, with an optional "-M" parameter if so.

zeeev commented 9 years ago

Thanks Colby,

Unfortunately that modification does not seem to work. If you have any other quick suggestions, I would be grateful. I have Tbs of data I would like to avoid realigning.

cc2qe commented 9 years ago

I just modified SVTyper to work with supplemental read alignments by default. If you have BAMs aligned using "bwa mem -M" (secondary split reads), then passing the "-M" flag to svtyper will accommodate this.

Does this solve the problem?

cc2qe commented 8 years ago

Adding mate tags to the BAM file with Samblaster solves this problem, although the underlying issue seems to be a version-specific problem with pysam that prevents if from accessing the mate read.

The following commands can add mate tags to a BWA-MEM aligned BAM file using Samblaster, which will solve the problem and also greatly increase speed. The input must be a name-sorted BAM file.

samtools view -h my.namesorted.bam \
    | samblaster --acceptDupMarks --excludeDups --addMateTags --maxSplitCount 2 --minNonOverlap 20 --splitterFile splitters.sam --discordantFile discordants.sam \
    | samtools view -Sb - > my.tagged.namesorted.bam

samtools sort my.tagged.namesorted.bam my.tagged.coordsorted.bam

samtools view -S -u splitters.sam | samtools sort - my.splitters.coordsorted.sam

Then my.splitters.coordsorted.sam and my.tagged.coordsorted.bam will be the inputs to SVTyper

ATpoint commented 6 years ago

Is there any update on that issue? I used the standard command to genotype lumpy outputs, with the -M flag, but the described error keeps popping up. As I have terabytes of BAM files to process, I really would like to avoid the samblaster step.