samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
649 stars 240 forks source link

[feature request] Option to use temporary files for `bcftools merge` #2066

Closed hjeremyli closed 8 months ago

hjeremyli commented 8 months ago

When merging larger numbers of VCFs, memory usage is often considerable; ad hoc testing indicates that merging ~60k variants across 1.5k samples uses >120Gb of memory. It would be useful to have an option like the -T option in bcftools sort which allows using temporary files instead of performing all operations in RAM in order to make this computation tenable for lower memory machines.

pd3 commented 8 months ago

That's odd, the program should not be requiring that much memory, for each file it keeps in memory only the header, the index and a few lines with the same position. In a test that I ran just now it merged 1,500 files requiring only 5GB of memory.

Possible things to do:

hjeremyli commented 8 months ago

@pd3 Thanks for the response. Updating to the version on master seems to have resolved this; it is now using a sane amount of memory (~3-5Gb) for the same task. I was previously using v1.16.