zeeev / wham

Structural variant detection and association testing
Other
101 stars 25 forks source link

Buffer overflow for WHAM-GRAPHENING #33

Closed nvolkova closed 8 years ago

nvolkova commented 8 years ago

I was trying to run WHAM-GRAPHENING on the same samples where WHAM-BAM was working quite well, and I keep getting the same error message regardless of the memory size I allocate for the run:

INFO: Reads with mapping quality below 5 will be filtered. INFO: fasta file: Caenorhabditis_elegans.WBcel235.dna.toplevel.fa INFO: target bars: CD0001b.bam INFO: graphs will be written to: CD0001b.graphening.raw.txt INFO: gathering stats (may take some time) for bam: CD0001b.bam INFO: processed 0 reads for: CD0001b.bam INFO: processed 0 reads for: CD0001b.bam ... INFO: processed 0 reads for: CD0001b.bam INFO: for file: CD0001b.bam CD0001b.bam: mean depth: ......... 45.3391 CD0001b.bam: sd depth: ........... 22.445 CD0001b.bam: mean insert length: . 380.299 CD0001b.bam: median insert length. 368 CD0001b.bam: sd insert length .... 85.0974 CD0001b.bam: lower insert length . 167.556 CD0001b.bam: upper insert length . 593.043 CD0001b.bam: average base quality: 38.4901 CD0001b.bam: number of reads used: 100144

INFO: Loading discordant reads into forest. INFO: Reading: CD0001b.bam ... INFO: Gathering alleles. INFO: Refined and genotyped 0/1 breakpoints * buffer overflow detected *: WHAM-GRAPHENING terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x3917902567] /lib64/libc.so.6[0x3917900450] WHAM-GRAPHENING[0x41ba7d] WHAM-GRAPHENING[0x42670a] WHAM-GRAPHENING[0x4052a0] /lib64/libc.so.6(__libc_start_main+0xfd)[0x391781ed5d] WHAM-GRAPHENING[0x408c61] ======= Memory map: ======== 00400000-0049d000 r-xp 00000000 00:12 29251231906 WHAM-GRAPHENING 0069d000-0069e000 r--p 0009d000 00:12 29251231906 WHAM-GRAPHENING 0069e000-0069f000 rw-p 0009e000 00:12 29251231906 WHAM-GRAPHENING 0069f000-006a0000 rw-p 00000000 00:00 0 ... 2ba6e40c3000-2ba6e40ca000 rw-p 00000000 00:00 0 7fff0fdfb000-7fff0fe11000 rw-p 00000000 00:00 0 [stack] 7fff0fef2000-7fff0fef3000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] error: 23287 Aborted (core dumped) WHAM-GRAPHENING -m 5 -a Caenorhabditis_elegans.WBcel235.dna.toplevel.fa -f CD0001b.bam -g CD0001b.graphening.raw.txt > wham.CD0001b.out

Thank you in advance for any help.

zeeev commented 8 years ago

@nvolkova Thank you for reporting this bug. Can you try running the command with -k? It'd help me track down if it is a genotyping bug.

nvolkova commented 8 years ago

@zeeev With -k flag it finishes successfully, but does not report any breakpoints, the .out file does only have the ##INFO part.

INFO: gathering stats (may take some time) for bam: CD0001b.bam INFO: processed 0 reads for: CD0001b.bam ... INFO: processed 0 reads for: CD0001b.bam INFO: for file:CD0001b.bam ... INFO: Loading discordant reads into forest. INFO: Reading: CD0001b.bam ... INFO: joining deletion breakpoints: 10947058 10947062 INFO: Gathering alleles. INFO: Skipping genotyping: -k set INFO: WHAM finished normally, goodbye!

zeeev commented 8 years ago

@nvolkova Would you be willing to share your bam file privately? Is there anything special about your data? I'd like to get to the bottom of this.

--Zev

nvolkova commented 8 years ago

@zeeev Well, the only special thing is that it is not human, but, as I understood, that is not an issue, or is it? I am trying to extract structural variants from C. elegans sequences.

I have taken another .bam file from my list (it might have been that there are no variants in the first one), and it again ran into buffer overflow when run with genotyping. However when run without genotyping, it produced some variants, which classify_WHAM_vcf.py failed to classify. Does it mean that WHAM-GRAPHENING output is anyhow different from that of WHAM-BAM?

And I am ready to share my .bam files, if that helps, although they are quite standard, as far as I can say.

Thanks again in advance for any help.

zeeev commented 8 years ago

@nvolkova,

There is no issue running either program on non-human data. I spent a good portion of my PhD working on pigeons.

Ah, yes, WHAM-GRAPHENING is a different algorithm (not pileup based) so there will be different info fields. WHAM-GRAPHENING classifies the SV type internally, removing the downstream python program.

To troubleshoot the buffer over flow I would just need the bam file that was giving you trouble and the matched reference genome.

Thank you for you help, and using WHAM.

--Zev

zeeev commented 8 years ago

@nvolkova any update on this issue?

nvolkova commented 8 years ago

@zeeev sorry for a delay. As you suggested, I have realigned the reads with BWA MEM, sorted and indexed, yet I am still getting buffer overflow: WHAM-GRAPHENING -a ref.fa -f sort.test.bam -g test.graph.txt > test.wham.vcf

... INFO: Trying to merge deletion breakpoints: 6 INFO: joining deletion breakpoints: 15747937 16357705 INFO: Gathering alleles. INFO: generated 100 alleles / 214 INFO: generated 200 alleles / 214 INFO: Refined and genotyped 0/214 breakpoints * buffer overflow detected *: WHAM-GRAPHENING terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x37ecd02567] /lib64/libc.so.6[0x37ecd00450] WHAM-GRAPHENING[0x41ba7d] WHAM-GRAPHENING[0x42670a] WHAM-GRAPHENING[0x4052a0] /lib64/libc.so.6(__libc_start_main+0xfd)[0x37ecc1ed5d] WHAM-GRAPHENING[0x408c61] ======= Memory map: ======== ... 7fff60025000-7fff6003b000 rw-p 00000000 00:00 0 [stack] 7fff6008c000-7fff6008d000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Aborted (core dumped)

Although now it works quite well when run without genotyping!

zeeev commented 8 years ago

Shoot. I've been working on recapitulating the error. Can you try this:

Run it without genotyping -k. Then use that file and genotype using -b? If that doesn't fail it will help me find the bug.

nvolkova commented 8 years ago

It is still failing: WHAM-GRAPHENING -k -a ref.fa -f sort.test.bam -g test.graph.txt > test.wham.vcf WHAM-GRAPHENING -b test.wham.vcf -a ref.fa -f sort.test.bam -g test.graph.txt > test.2.wham.vcf

... INFO: loading external SV calls INFO: Gathering alleles. INFO: generated 100 alleles / 212 INFO: generated 200 alleles / 212 INFO: Refined and genotyped 0/212 breakpoints * buffer overflow detected *: WHAM-GRAPHENING terminated

Backtrace and memory map seem to be exactly the same.

zeeev commented 8 years ago

Can you send me that VCF.

zeeev commented 8 years ago

In the Makefile can you add -fstack-protector-all to the CFLAGS?

It will look like:

CFLAGS= -Wall -fstack-protector-all -DVERSION=\"$(GIT_VERSION)\" -DFAST -std=c++0x

Re-make and try again? I really appreciate your help. I'm still not able to generate the error.

nvolkova commented 8 years ago

@zeeev I added the flag - but nothing changed. Do you think it may have something to do with the cluster where I'm trying to run it? Wrong libraries or specification?

zeeev commented 8 years ago

I've merged the development branch into the master. I've disabled genotyping until I finish the new genotyping algorithm. Thanks for your help. The new version of the caller shouldn't give you any trouble, but no genotypes.