mskcc / htstools

5 stars 3 forks source link

snp-pileup realloc error #6

Open veseshan opened 5 years ago

veseshan commented 5 years ago

Alex

I installed htstools on a new Ubuntu (bionic) server with htslib 1.9 and got the following error

xxx@yyy:~$ snp-pileup -v -A -q15 -Q20 -P100 -r10 -g dbsnp_137.vcf.gz  out.gz sample.bam
Detected format for file 1: BAM version 1 compressed sequence data
Max per-file depth set to 4000.
realloc(): invalid old size
Aborted

After some false leads I was able to compile the code using htslib 1.6 and it works fine. It also fails when using htslib 1.7. I can't see any place in the code where realloc is used. I turned on all the printf and saw that it is going through the bam and vcf and then crashes. Unable to figure out why exactly it does so. Any thoughts on how it could be fixed?

Thanks, Venkat

thatoddmailbox commented 5 years ago

Sorry Venkat, I just saw this!

Does this happen with all bam files? If no, is it possible you could share a sample that crashes? Also do you know what version of GCC was used to compile the code (gcc -v)? This seems like it could be an htslib issue, but I have to investigate more.

veseshan commented 5 years ago

Thanks Alex. No worries. That error happened with every bam file I tried, Here is the gcc info

xxx@yyy:~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04) 

Since I came across a new error where the program (compiled with htslib 1.6) stops for some bam files with a "corrupted size vs. prev_size" message and no other useful information. The program worked for all bam files on an older server with htslib 1.6 and gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)

thatoddmailbox commented 5 years ago

So I tried to run snp-pileup with random, sample BAM and VCF files I found on the Internet and it does appear to complete, even with the same options you set in your example. I'm using Ubuntu 18.04 and htslib 1.9. Would it be possible for you to share sample files that crash? Also, I noticed in the changelog for htslib 1.7 that they now support "BAMs which include CIGARs with more than 65535 operations as per HTS-Specs 18th November". Does this apply to your BAM files?

veseshan commented 5 years ago

I downloaded the following bam

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqBg02esG1bAlnRep1.bam

and snp-pileup worked fine when compiled against htslib 1.7 (provided by libhts-dev in bionic). So now I am totally confused. Unfortunately I can't share the bams that are giving me trouble. Let me see if I can find anymore information about them. Thanks,

Venkat

veseshan commented 5 years ago

I have now tried several versions of htslib on a Ubuntu xenial server. I am getting segmentation fault when I use snp-pileup compiled with versions 1.4 on (tried 1.4, 1.6 and 1.9). Program works fine when compiled against 1.2 and 1.3.1. https://github.com/samtools/htslib/releases/tag/1.4 says

Incompatible changes: several functions and data types have been changed
in this release, and the shared library soversion has been bumped to 2.

Could this be causing the issue. I haven't tried using htslib 1.3.2 on bionic yet. Thanks,

thatoddmailbox commented 5 years ago

Question, when you try the different version of htslib, are you cleaning the directory (removing any .o files) and recompiling, or just swapping the library shared object file? I think you will have to clean and recompile, as, like you said, there seems to be some changes made to the data structures in htslib 1.4+.

veseshan commented 5 years ago

Every version is compiled fresh in its own directory.

I am getting a segmentation fault when I use snp-pileup on wgEncodeUwRepliSeqBg02esG1bAlnRep1.bam compiled with htslib-1.3.2 on bionic

admin@zzz:~$ uname -a
Linux zzz 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

admin@zzz:~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04) 

admin@zzz:~$ ldd htstools/snp-pileup132
    linux-vdso.so.1 (0x00007ffd441c3000)
    libhts.so.1 => /opt/htslib-1.3.2/lib/libhts.so.1 (0x00007f411b821000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f411b498000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f411b280000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f411ae8f000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f411ac72000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f411a8d4000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f411a6b5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f411bcc0000)

admin@zzz:~$ htstools/snp-pileup132 -v -q15 -Q20 -P100 -r10 -g /usr/local/share/VCF/dbsnp_137.b37__RmDupsClean.vcf.gz blah.gz wgEncodeUwRepliSeqBg02esG1bAlnRep1.bam 
Detected format for file 1: BAM version 1 compressed sequence data
Max per-file depth set to 4000.
Segmentation fault (core dumped)
thatoddmailbox commented 5 years ago

So I tried running it with htslib 1.3.2 under bionic, with that bam file, but it still seems to work. Here is my command output:

alex@alex-VirtualBox:~/Documents/bam$ uname -a
Linux alex-VirtualBox 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

alex@alex-VirtualBox:~/Documents/bam$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-16ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --with-as=/usr/bin/x86_64-linux-gnu-as --with-ld=/usr/bin/x86_64-linux-gnu-ld --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3) 

alex@alex-VirtualBox:~/Documents/bam$ ldd ./htstools/snp-pileup
    linux-vdso.so.1 (0x00007fffd634f000)
    libhts.so.1 => /home/alex/Documents/bam/hts132/lib/libhts.so.1 (0x00007fa39ea8a000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa39e6fc000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa39e4e4000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa39e0f3000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fa39ded6000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa39db38000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa39d919000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa39ef29000)

alex@alex-VirtualBox:~/Documents/bam$ ./htstools/snp-pileup -v -A -q15 -Q20 -P100 -r10 -g All_20180531_papu.vcf.gz out.gz wgEncodeUwRepliSeqBg02esG1bAlnRep1.bam 
Detected format for file 1: BAM version 1 compressed sequence data
Max per-file depth set to 4000.
Finished in 60.435205 seconds.

I wonder if it's a difference in the VCF file that could be causing the issue? I'm using this file: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/All_20180531_papu.vcf.gz. I'm not sure if this is the correct file or if I should be using something else.

veseshan commented 5 years ago

Thanks Alex. I was away last week. I used common_all_20180418.vcf.gz as well as the older common_all_20160601.vcf.gz and got segmentation error. I noticed one difference between our setup. Your gcc package is 7.3.0-16ubuntu3 where as mine is 7.3.0-27ubuntu1~18.04. {yours is the release version; mine is the update version). I don't know if that is the source of the difference.

thatoddmailbox commented 5 years ago

I'm still not able to get the crash, even with the latest G++ and using common_all_20180418.vcf.gz. I wonder if it's possible for you to create a core dump of the program? Run ulimit -c unlimited to enable core dumps (see this page for more information) and then run the program to cause the segfault. (you can then run ulimit -c 0 to disable core dumps again, the default in Ubuntu) There should then be a new core file in the current working directory, and if you could send that to me, that would be great!

veseshan commented 5 years ago

Will send by direct message. Thanks.

thatoddmailbox commented 5 years ago

So I looked at the core dumps you shared with me, and the segfault seems to happen on line 385 of snp-pileup.cpp, where it tries to read the pileup results. htslib reports that there are 10 results for the current position, but only the first result is valid—all the other results contain corrupt data. I'm not sure if this corruption is an issue with htslib or my program, but I do think that this is the underlying cause behind all these issues. The crashing read is in position 2258900 (or 2258899 if zero-indexed) of chr1 of the common_all file.

Just to confirm, this segfault was happening with the NCBI common_all_20180418.vcf.gz, the wgEncodeUwRepliSeqBg02esG1bAlnRep1.bam, and the parameters -v -A -q15 -Q20 -P100 -r10 -g? I've tried running snp-pileup on my local machine with those parameters and files, and still haven't been able to get the segfault, and the program just runs to completion.

Another question, does the crash always happen immediately, or does it take a few seconds of processing before it happens?

veseshan commented 5 years ago

Crash happens immediately.

admin@z800:~$ time /usr/local/share/pileup/snp-pileup131g -v -A -q15 -Q20 -P100 -r10 -g common_all_20180418.vcf.gz outfile131.gz /usr/local/share/pileup/wgEncodeUwRepliSeqBg02esG1bAlnRep1.bam
Detected format for file 1: BAM version 1 compressed sequence data
Max per-file depth set to 4000.
Segmentation fault (core dumped)

real    0m0.117s
user    0m0.103s
sys 0m0.004s

Thanks