moiexpositoalonsolab / grenepipe

A flexible, scalable, and reproducible pipeline to automate variant calling from raw sequence reads, with lots of bells and whistles.
http://grene-net.org
GNU General Public License v3.0
94 stars 21 forks source link

java.lang.OutOfMemoryError: Java heap space #33

Closed RvV1979 closed 1 year ago

RvV1979 commented 1 year ago

Hi Lucas,

I am running grenepipe v0.12.0 via slurm and run into an error with rule mark_duplicates on some of my samples. Based on the picard logfile, it looks like the deduplication process finishes, but also throws an Out of Memory error. See tail of logfile, below:

INFO    2023-07-01 16:37:12     MarkDuplicates  Read   498,000,000 records.  Elapsed time: 01:04:09s.  Time for last 1,000,000:   62s.  Last read position: NC_044378.1:3,429,197
INFO    2023-07-01 16:37:12     MarkDuplicates  Tracking 10747258 as yet unmatched pairs. 5068892 records in RAM.
[Sat Jul 01 16:45:52 CEST 2023] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 72.83 minutes.
Runtime.totalMemory()=2075918336
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:548)
        at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
        at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
        at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
        at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
        at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:151)
        at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:421)
        at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394)
        at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:507)
        at htsjdk.samtools.util.BinaryCodec.readInt(BinaryCodec.java:518)
        at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:261)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:880)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:854)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:848)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:816)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:591)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:570)
        at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:536)
        at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:270)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

I tried increasing the allocated memory via profile/slurm.yaml but this did not help. In any case, the slurm job did not throw a memory error so I am guessing the issue lies with the java process not having not enough allocated memory. However, i could not find a way to increase allocated memory for java.

Do you have an idea what may be going wrong and how I might solve this issue?

Thanks

lczech commented 1 year ago

Hi @RvV1979,

ah yes, this indeed seems to be a Java OOM, which won't be showing up in the slurm logs. This has been addressed in grenepipe v0.12.1, where I added support for extra Java options, for exactly that type of error. The option is called MarkDuplicates-java-opts, see here. The documentation there also states how to specify memory.

So, simply get the new version of grenepipe, set the memory as needed, and then hopefully that works ;-)

Cheers and so long Lucas

RvV1979 commented 1 year ago

I can confirm that the issue is resolved by using the latest version and specifying MarkDuplicates-java-opts: "-Xmx40g" Thanks for the prompt support!