harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

java space/memory issue for db2vcf step #91

Closed subirshakya closed 3 months ago

subirshakya commented 1 year ago

the dv2vcf step fails for only the last set of db intervals (originally had 6 intervals fail to finish and subsequently after filtering genome for small scaffolds (<1000 bp) had 1 interval fail to finish). The error that genomicsdbimport generates says out of space error (i used a tmp directory in the working directory so should not run out of memory unless genomicsdbimport uses a different tmp directory (i can can be set with --tmpdir option). The log file of one of these is included below. There was also a notice of a fatal error with java.

0130.txt

cademirch commented 1 year ago

Hi @subirshakya, sorry this got missed. How much space is available on the filesystem you're working on? And how big are the gvcfs?

subirshakya commented 1 year ago

Hi @cademirch, I was running this on a work directory that essentially has unlimited space (at least a very large amount of space that running out of space would not have been possible). The gvcfs were around around 30-40 Mb I think. The intervals that fail had a large number of small scaffolds (which I think is the problem). I was able to not run into that problem for some by filtering out all the smaller scaffolds. However, for some I had to create a dummy blank vcf file to bypass intervals that still fail.