samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
634 stars 241 forks source link

bcftools merge memory map error #2144

Closed MarcElosua closed 3 months ago

MarcElosua commented 3 months ago

Hi,

First of all, thank you so much for putting together and maintaining such an amazing package!!

As I am trying to merge 75 VCF files from human chromosome X I keep getting this error:

Sample of list of files

> ls -lhGgo

total 174M

-rw-r--r-- 1 2.1M Mar 27 15:04 840001002_D1401896.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001002_D1401896.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 2.0M Mar 27 15:04 840001004_D1401897.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001004_D1401897.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 1.7M Mar 27 15:04 840001006_D1401898.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001006_D1401898.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 2.5M Mar 27 15:04 840001009_D1401899.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001009_D1401899.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 2.6M Mar 27 15:04 840001010_D1401900.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001010_D1401900.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 2.4M Mar 27 15:04 840001011_D1401901.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001011_D1401901.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 1.8M Mar 27 15:04 840001014_D1401902.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001014_D1401902.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 1.8M Mar 27 15:04 840001015_D1401903.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001015_D1401903.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 2.6M Mar 27 15:04 840001016_D1401904.sorted_chrX.vcf.gz
-rw-r--r-- 1  16K Mar 29 12:09 840001016_D1401904.sorted_chrX.vcf.gz.csi
-rw-r--r-- 1 2.0M Mar 27 15:04 840001017_D1401905.sorted_chrX.vcf.gz
-rw-r--r-- 1  15K Mar 29 12:09 840001017_D1401905.sorted_chrX.vcf.gz.csi

Command running:

chr=X
bcftools merge *sorted_chr${chr}.vcf.gz -Oz -o merged_chr${chr}.vcf.gz

Error:

*** Error in `/----/x86_64-linux/bcftools/1.19/bin/bcftools': free(): invalid next size (fast): 0x00005632b4b29510 ***

======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x7f7d5184b329]
/----/x86_64-linux/bcftools/1.19/bin/bcftools(merge_chrom2qual+0x6c3)[0x5632b0ee5123]
/----/x86_64-linux/bcftools/1.19/bin/bcftools(merge_line+0x21)[0x5632b0ef2ff1]
/----/x86_64-linux/bcftools/1.19/bin/bcftools(merge_vcf+0x85f)[0x5632b0ef434f]
/----/x86_64-linux/bcftools/1.19/bin/bcftools(main_vcfmerge+0x646)[0x5632b0ef5dc6]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7d517ec555]
/----/x86_64-linux/bcftools/1.19/bin/bcftools(+0x11321)[0x5632b0ed6321]
======= Memory map: ========
5632b0ec5000-5632b0ed6000 r--p 00000000 00:2d 12684312793                /----/x86_64-linux/bcftools/1.19/bin/bcftools
5632b0ed6000-5632b0faa000 r-xp 00011000 00:2d 12684312793                /----/x86_64-linux/bcftools/1.19/bin/bcftools
5632b0faa000-5632b0fe7000 r--p 000e5000 00:2d 12684312793                /----/x86_64-linux/bcftools/1.19/bin/bcftools
5632b0fe7000-5632b0fe9000 r--p 00122000 00:2d 12684312793                /----/x86_64-linux/bcftools/1.19/bin/bcftools
5632b0fe9000-5632b0fee000 rw-p 00124000 00:2d 12684312793                /----/x86_64-linux/bcftools/1.19/bin/bcftools
5632b10df000-5632b4c4d000 rw-p 00000000 00:00 0                          [heap]
7f7d40000000-7f7d40021000 rw-p 00000000 00:00 0 
7f7d40021000-7f7d44000000 ---p 00000000 00:00 0 
7f7d46dbf000-7f7d4f237000 rw-p 00000000 00:00 0 
7f7d4f237000-7f7d4f256000 r--p 00000000 00:2d 12675532299                /----/x86_64-linux/bcftools/1.19/lib/libgfortran.so.5
7f7d4f256000-7f7d4f3b1000 r-xp 0001f000 00:2d 12675532299                /----/x86_64-linux/bcftools/1.19/lib/libgfortran.so.5
7f7d4f3b1000-7f7d4f3df000 r--p 0017a000 00:2d 12675532299                /----/x86_64-linux/bcftools/1.19/lib/libgfortran.so.5
7f7d4f3df000-7f7d4f3e0000 r--p 001a8000 00:2d 12675532299                /----/x86_64-linux/bcftools/1.19/lib/libgfortran.so.5
7f7d4f3e0000-7f7d4f3e2000 rw-p 001a9000 00:2d 12675532299                /----/x86_64-linux/bcftools/1.19/lib/libgfortran.so.5

When I run it with a subset ~35 samples it works no problem, the issue comes when I increase the number of samples.

All the .vcf.gz files are very small <3.5Mb so my guess is it is not a total memory usage problem. Do you have any advice on how to go abou this?

Thank you so much!

pd3 commented 3 months ago

There was one similar error here https://github.com/samtools/bcftools/issues/1353, but that was with an older version of htslib and that bug is fixed.

Is there any chance you could narrow it down to a small reproducible test case? Unfortunately, the backtrace does not help much in debugging the problem.

MarcElosua commented 3 months ago

hi @pd3

Thank you so much for the pointer, it was super useful and I was actually having the same issue. I filtered out non PASS positions and bcftools merge worked smoothly!

pd3 commented 3 months ago

I am glad you found a way around it. I would be still keen on having a small test case and fixing the problem. The program should not be crashing like this.

MarcElosua commented 3 months ago

I'll try to debug it in more depth!