vcftools / vcftools

A set of tools written in Perl and C++ for working with VCF files, such as those generated by the 1000 Genomes Project.
https://vcftools.github.io/
GNU Lesser General Public License v3.0
486 stars 148 forks source link

Memory error with TsTv-by-count #196

Open DrMcStrange opened 1 year ago

DrMcStrange commented 1 year ago

Hi, I've hit what appears to be a memory error when running the Sarek pipeline. I've tracked down the command it's running and tested it outside the pipeline.

The command I've tested with is vcftools --gzvcf work/13/3f62fb62cec65897841222bd2304d5/joint_germline.vcf.gz --out tstv-test --TsTv-by-count

and the output is:

VCFtools - 0.1.17
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        --gzvcf work/13/3f62fb62cec65897841222bd2304d5/joint_germline.vcf.gz
        --out tstv-test
        --TsTv-by-count

Using zlib version: 1.2.7
Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
After filtering, kept 5 out of 5 Individuals
Outputting Ts/Tv by Alternative Allele Count
After filtering, kept 9141763 out of a possible 9141763 Sites
Run Time = 45.00 seconds
*** Error in `vcftools': free(): invalid size: 0x00000000015a7e10 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x2b7051439329]
vcftools[0x41172c]
vcftools[0x4115cc]
vcftools[0x45f464]
vcftools[0x4bfed9]
vcftools[0x406bc0]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b70513da555]
vcftools[0x407550]
======= Memory map: ========
00400000-004de000 r-xp 00000000 00:2d 1099518427774                      /share/apps/rosalind/gcc_6.4.0_apps/vcftools/0.1.16/bin/vcftools
004df000-004e0000 r--p 000de000 00:2d 1099518427774                      /share/apps/rosalind/gcc_6.4.0_apps/vcftools/0.1.16/bin/vcftools
004e0000-004e1000 rw-p 000df000 00:2d 1099518427774                      /share/apps/rosalind/gcc_6.4.0_apps/vcftools/0.1.16/bin/vcftools
004e1000-004e2000 rw-p 00000000 00:00 0
0133b000-017f4000 rw-p 00000000 00:00 0                                  [heap]
2b7050c7c000-2b7050c9e000 r-xp 00000000 00:13 55056                      /usr/lib64/ld-2.17.so
2b7050c9e000-2b7050ca0000 rw-p 00000000 00:00 0
2b7050cab000-2b7050cac000 rw-p 00000000 00:00 0
2b7050cac000-2b7050e27000 r-xp 00000000 00:2d 1099519690908              /share/apps/software/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
2b7050e27000-2b7050e31000 r--p 0017a000 00:2d 1099519690908              /share/apps/software/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
2b7050e31000-2b7050e35000 rw-p 00184000 00:2d 1099519690908              /share/apps/software/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
2b7050e35000-2b7050e38000 rw-p 00000000 00:00 0
2b7050e38000-2b7050e4e000 r-xp 00000000 00:2d 1099519690874              /share/apps/software/GCCcore/6.4.0/lib64/libgcc_s.so.1
2b7050e4e000-2b7050e4f000 r--p 00015000 00:2d 1099519690874              /share/apps/software/GCCcore/6.4.0/lib64/libgcc_s.so.1
2b7050e4f000-2b7050e50000 rw-p 00016000 00:2d 1099519690874              /share/apps/software/GCCcore/6.4.0/lib64/libgcc_s.so.1
2b7050e50000-2b7050e55000 rw-p 00000000 00:00 0
2b7050e9d000-2b7050e9e000 r--p 00021000 00:13 55056                      /usr/lib64/ld-2.17.so
2b7050e9e000-2b7050e9f000 rw-p 00022000 00:13 55056                      /usr/lib64/ld-2.17.so
2b7050e9f000-2b7050ea0000 rw-p 00000000 00:00 0
2b7050ea0000-2b7050eb5000 r-xp 00000000 00:13 58361                      /usr/lib64/libz.so.1.2.7
2b7050eb5000-2b70510b4000 ---p 00015000 00:13 58361                      /usr/lib64/libz.so.1.2.7
2b70510b4000-2b70510b5000 r--p 00014000 00:13 58361                      /usr/lib64/libz.so.1.2.7
2b70510b5000-2b70510b6000 rw-p 00015000 00:13 58361                      /usr/lib64/libz.so.1.2.7
2b70510b6000-2b70511b7000 r-xp 00000000 00:13 55088                      /usr/lib64/libm-2.17.so
2b70511b7000-2b70513b6000 ---p 00101000 00:13 55088                      /usr/lib64/libm-2.17.so
2b70513b6000-2b70513b7000 r--p 00100000 00:13 55088                      /usr/lib64/libm-2.17.so
2b70513b7000-2b70513b8000 rw-p 00101000 00:13 55088                      /usr/lib64/libm-2.17.so
2b70513b8000-2b705157c000 r-xp 00000000 00:13 55072                      /usr/lib64/libc-2.17.so
2b705157c000-2b705177b000 ---p 001c4000 00:13 55072                      /usr/lib64/libc-2.17.so
2b705177b000-2b705177f000 r--p 001c3000 00:13 55072                      /usr/lib64/libc-2.17.so
2b705177f000-2b7051781000 rw-p 001c7000 00:13 55072                      /usr/lib64/libc-2.17.so
2b7051781000-2b7051786000 rw-p 00000000 00:00 0
2b7051c87000-2b7051dd8000 rw-p 00000000 00:00 0
2b7054000000-2b7054021000 rw-p 00000000 00:00 0
2b7054021000-2b7058000000 ---p 00000000 00:00 0
7ffeee225000-7ffeee249000 rw-p 00000000 00:00 0                          [stack]
7ffeee3c2000-7ffeee3c4000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted

Given that Sarek usually runs fine, I'm guessing it's not likely to be a problem with the input VCF. Any ideas what could be going wrong here?

auton1 commented 1 year ago

This looks to be an error that is raised right at the end of execution (on cleanup after analysis), and may not impact the results. Is the program producing output?

On Tue, 13 Sept 2022 at 00:02, Bennet McComish @.***> wrote:

Hi, I've hit what appears to be a memory error when running the Sarek pipeline. I've tracked down the command it's running and tested it outside the pipeline.

The command I've tested with is vcftools --gzvcf work/13/3f62fb62cec65897841222bd2304d5/joint_germline.vcf.gz --out tstv-test --TsTv-by-count

and the output is:

VCFtools - 0.1.17 (C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted: --gzvcf work/13/3f62fb62cec65897841222bd2304d5/joint_germline.vcf.gz --out tstv-test --TsTv-by-count

Using zlib version: 1.2.7 Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles"> Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group"> Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification"> Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)"> Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed"> Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed"> After filtering, kept 5 out of 5 Individuals Outputting Ts/Tv by Alternative Allele Count After filtering, kept 9141763 out of a possible 9141763 Sites Run Time = 45.00 seconds Error in `vcftools': free(): invalid size: 0x00000000015a7e10 ======= Backtrace: ========= /lib64/libc.so.6(+0x81329)[0x2b7051439329] vcftools[0x41172c] vcftools[0x4115cc] vcftools[0x45f464] vcftools[0x4bfed9] vcftools[0x406bc0] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b70513da555] vcftools[0x407550] ======= Memory map: ======== 00400000-004de000 r-xp 00000000 00:2d 1099518427774 /share/apps/rosalind/gcc_6.4.0_apps/vcftools/0.1.16/bin/vcftools 004df000-004e0000 r--p 000de000 00:2d 1099518427774 /share/apps/rosalind/gcc_6.4.0_apps/vcftools/0.1.16/bin/vcftools 004e0000-004e1000 rw-p 000df000 00:2d 1099518427774 /share/apps/rosalind/gcc_6.4.0_apps/vcftools/0.1.16/bin/vcftools 004e1000-004e2000 rw-p 00000000 00:00 0 0133b000-017f4000 rw-p 00000000 00:00 0 [heap] 2b7050c7c000-2b7050c9e000 r-xp 00000000 00:13 55056 /usr/lib64/ld-2.17.so 2b7050c9e000-2b7050ca0000 rw-p 00000000 00:00 0 2b7050cab000-2b7050cac000 rw-p 00000000 00:00 0 2b7050cac000-2b7050e27000 r-xp 00000000 00:2d 1099519690908 /share/apps/software/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22 2b7050e27000-2b7050e31000 r--p 0017a000 00:2d 1099519690908 /share/apps/software/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22 2b7050e31000-2b7050e35000 rw-p 00184000 00:2d 1099519690908 /share/apps/software/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22 2b7050e35000-2b7050e38000 rw-p 00000000 00:00 0 2b7050e38000-2b7050e4e000 r-xp 00000000 00:2d 1099519690874 /share/apps/software/GCCcore/6.4.0/lib64/libgcc_s.so.1 2b7050e4e000-2b7050e4f000 r--p 00015000 00:2d 1099519690874 /share/apps/software/GCCcore/6.4.0/lib64/libgcc_s.so.1 2b7050e4f000-2b7050e50000 rw-p 00016000 00:2d 1099519690874 /share/apps/software/GCCcore/6.4.0/lib64/libgcc_s.so.1 2b7050e50000-2b7050e55000 rw-p 00000000 00:00 0 2b7050e9d000-2b7050e9e000 r--p 00021000 00:13 55056 /usr/lib64/ld-2.17.so 2b7050e9e000-2b7050e9f000 rw-p 00022000 00:13 55056 /usr/lib64/ld-2.17.so 2b7050e9f000-2b7050ea0000 rw-p 00000000 00:00 0 2b7050ea0000-2b7050eb5000 r-xp 00000000 00:13 58361 /usr/lib64/libz.so.1.2.7 2b7050eb5000-2b70510b4000 ---p 00015000 00:13 58361 /usr/lib64/libz.so.1.2.7 2b70510b4000-2b70510b5000 r--p 00014000 00:13 58361 /usr/lib64/libz.so.1.2.7 2b70510b5000-2b70510b6000 rw-p 00015000 00:13 58361 /usr/lib64/libz.so.1.2.7 2b70510b6000-2b70511b7000 r-xp 00000000 00:13 55088 /usr/lib64/libm-2.17.so 2b70511b7000-2b70513b6000 ---p 00101000 00:13 55088 /usr/lib64/libm-2.17.so 2b70513b6000-2b70513b7000 r--p 00100000 00:13 55088 /usr/lib64/libm-2.17.so 2b70513b7000-2b70513b8000 rw-p 00101000 00:13 55088 /usr/lib64/libm-2.17.so 2b70513b8000-2b705157c000 r-xp 00000000 00:13 55072 /usr/lib64/libc-2.17.so 2b705157c000-2b705177b000 ---p 001c4000 00:13 55072 /usr/lib64/libc-2.17.so 2b705177b000-2b705177f000 r--p 001c3000 00:13 55072 /usr/lib64/libc-2.17.so 2b705177f000-2b7051781000 rw-p 001c7000 00:13 55072 /usr/lib64/libc-2.17.so 2b7051781000-2b7051786000 rw-p 00000000 00:00 0 2b7051c87000-2b7051dd8000 rw-p 00000000 00:00 0 2b7054000000-2b7054021000 rw-p 00000000 00:00 0 2b7054021000-2b7058000000 ---p 00000000 00:00 0 7ffeee225000-7ffeee249000 rw-p 00000000 00:00 0 [stack] 7ffeee3c2000-7ffeee3c4000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Aborted

Given that Sarek usually runs fine, I'm guessing it's not likely to be a problem with the input VCF. Any ideas what could be going wrong here?

— Reply to this email directly, view it on GitHub https://github.com/vcftools/vcftools/issues/196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEVMY6SC2O5DFXGV42D7ETV6ARG3ANCNFSM6AAAAAAQLEAWEE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Adam Auton

DrMcStrange commented 1 year ago

Yes it is, so the only real problem is that the error code is killing the sarek run. I'll look for a way to override that at the sarek end.

Thanks!