samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
640 stars 241 forks source link

memory issue when sum(AD) > DP in bcftools stats -s #2102

Closed 23andme-jaredo closed 5 months ago

23andme-jaredo commented 5 months ago

This is a weird edge case I found with the output of GLNexus jointcalling. I have created an extreme toy example here since I found the issue in non-public data and the memory error was sporadic there. I guess in practice there is some interaction between low resolution reference intervals and the joining of indels across samples that create some slight inconsistency between AD/DP.

Obviously the input is invalid (although within spec?) but it would be nice to not segfault here:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr4,length=190214555>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE1 SAMPLE2
chr4    5745188 .   TCTTC   TTTC,T  416472  PASS    .   GT:AD:DP    0/1:9,17,0:26   1/2:10,15,1000000:25

problem is here:

https://github.com/samtools/bcftools/blob/develop/vcfstats.c#L938-L944

I guess you could go #define vaf2bin(vaf) min(20,((int)nearbyintf((vaf)/0.05))) as a crude fix.

pd3 commented 5 months ago

Can you also show the command and the version of the program? I was not able to reproduce the problem with the latest version

23andme-jaredo commented 5 months ago

Sorry, this was bcftools-1.19 on centos7:

$ valgrind bcftools stats -s - test.vcf > /dev/null
==31004== Memcheck, a memory error detector
==31004== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31004== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==31004== Command: bcftools stats -s - test.vcf
==31004==
==31004== Invalid read of size 4
==31004==    at 0x4129DB: update_vaf (vcfstats.c:943)
==31004==    by 0x4129DB: do_sample_stats (vcfstats.c:1140)
==31004==    by 0x414D2F: do_vcf_stats (vcfstats.c:1293)
==31004==    by 0x414D2F: main_vcfstats (vcfstats.c:2019)
==31004==    by 0x5A97554: (below main) (in /usr/lib64/libc-2.17.so)
==31004==  Address 0x9914e4c is 1,371,500 bytes inside an unallocated block of size 2,281,216 in arena "client"
==31004==
==31004==
==31004== HEAP SUMMARY:
==31004==     in use at exit: 0 bytes in 0 blocks
==31004==   total heap usage: 252 allocs, 252 frees, 1,895,324 bytes allocated
==31004==
==31004== All heap blocks were freed -- no leaks are possible
==31004==
==31004== For lists of detected and suppressed errors, rerun with: -s
==31004== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)

the value of idx is >20 which is causing the problem.

Better to calculate vaf = AD[i]/sum(AD) ?

pd3 commented 5 months ago

Thank you for the bug report, this is now fixed.

23andme-jaredo commented 5 months ago

thanks!