samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Possible core dumps due to Intel's gkl #1420

Open nh13 opened 5 years ago

nh13 commented 5 years ago

I was wondering if other folks (@tfenne @lbergelson @yfarjoun) are seeing frequent core dumps using the latest Picard/fgbio, which rely upon htsjdk. I have exactly five users over here compiling both from source and getting frequent (1/10) core dumps.

Stack: [0x000070000da04000,0x000070000db04000],  sp=0x000070000db029f0,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libgkl_compression1961443782838211236.dylib+0x6ea7]  deflate_medium+0x867
C  [libgkl_compression1961443782838211236.dylib+0x508b]  deflate+0xf1b
C  [libgkl_compression1961443782838211236.dylib+0x1bac]  Java_com_intel_gkl_compression_IntelDeflater_deflateNative+0x1bc
j  com.intel.gkl.compression.IntelDeflater.deflateNative([BI)I+0
j  com.intel.gkl.compression.IntelDeflater.deflate([BII)I+3
j  htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock()I+55
j  htsjdk.samtools.util.BlockCompressedOutputStream.write([BII)V+113
j  htsjdk.samtools.util.BinaryCodec.writeBytes([BII)V+24
j  htsjdk.samtools.util.BinaryCodec.writeByteBuffer(I)V+35
J 3957 C1 htsjdk.samtools.BinaryTagCodec.writeArray(Ljava/lang/Object;Z)V (362 bytes) @ 0x0000000109ea2e04 [0x0000000109ea1a60+0x13a4]
J 3965 C1 htsjdk.samtools.BinaryTagCodec.writeTag(SLjava/lang/Object;Z)V (311 bytes) @ 0x0000000109eab9e4 [0x0000000109ea9880+0x2164]
fleharty commented 5 years ago

@nh13 I've heard rumor that there is a bug in the Intel deflater. From what I've seen, it only affects users with macOS Mojave (but not all of them).

nh13 commented 5 years ago

@fleharty we are all OSX users (10.14.xx)

lbergelson commented 5 years ago

@nh13 You're probably hitting the same issue as this, https://github.com/broadinstitute/picard/issues/1383. It's a real pain. I have been getting pretty consistent segfaults on osx 10.14.xx when using intel deflater on some files. It seems to be specific to certain inputs but I don't understand what the error condition is.

Try disabling the intel deflater. It's unfortunate since disabling it will slow everything down. Intel is aware of the issue but they don't currently have any engineers who are able to work on the problem. I've been told that they have 2 people who are getting up to speed but I don't know what timeline we're looking at. See https://github.com/Intel-HLS/GKL/issues/101.

I don't think I'm going to be able to fix the deflater bug myself without a significant time investment that I'm not currently able to make, and I don't think anyone else on our team is going to be able to do it anytime soon. If you our @tfenne are interested in taking a crack at it I'm sure intel would be willing to accept a PR.

nh13 commented 5 years ago

@lbergelson we may think of adding support for other deflaters, as they seem just as good: http://jkbonfield.github.io/www.htslib.org/benchmarks/zlib.html

lbergelson commented 5 years ago

Interesting.

@jkbonfield Is the intel optimized zlib that you tested the same as the one here https://github.com/Intel-HLS/GKL/tree/master/src/main/native/compression ?

jkbonfield commented 5 years ago

No, it was from another Intel developer: https://github.com/jtkukunas/zlib

I wasn't aware of this other one. I don't know how they differ.

lbergelson commented 5 years ago

The one I linked is the one that GATK and Picard use. I'm curious how it compares.

jkbonfield commented 5 years ago

I did some testing and it came out at 1m17s decode (1 thread) and 2m13s encode (elapsed time with 4 threads; 8m29s CPU). File size was 6,580,893,700 bytes.

This corresponds almost perfectly with the "Intel" line in my chart which had 1m15s decode and 2m11s encode, with identical file size. so my guess is the GKL incorporates the same jkukunas zlib code.

I would recommend therefore you try integrating the libdeflate version instead and giving it a whirl to see how it performs inside Java. Note it's not a compatible zlib API and it doesn't have the same streaming nature, so it'll require quite a bit of interface wrapping. If you're doing that it's also worth looking at slz (http://www.libslz.org/) for super-fast deflate at level 1. It's not that good at ratio, but the design of it is to be as fast as possible at compression (much like igzip-1 I guess). I haven't benchmarked it myself on BAM though.

jkbonfield commented 5 years ago

FYI a quick hacky test with slz "enc" program, swallowing the same uncompressed BAM (around 16GB) at level 1 took 97s (NB: 1 thread) and compressed it to ~9GB. Not good compression and my test used shed-loads of memory as it slurped the entire file into memory, but it gives an indication of level 1 compression performance. That's around twice as fast as libdeflate at level 1, albeit around 30% larger.

So maybe worth investigating for temporary files, but tbh we could also use e.g. zstd or lz4 for temporary files too as they're never going to be ingested by anything other than our own code.

lbergelson commented 2 years ago

I believe the core dumps are fixed in the 0.8.8 release of GKL.