lz4 / lz4-java

LZ4 compression for Java
Apache License 2.0
1.1k stars 253 forks source link

Avoid Exception-related performance issues in LZ4BlockInputStream when stopOnEmptyBlock == false #143

Closed JoshRosen closed 5 years ago

JoshRosen commented 5 years ago

This PR improves the performance of LZ4BlockInputStream when stopOnEmptyBlock = false, a mode used in Apache Spark (see #105 and #76 for background).

When stopOnEmptyBlock = false, LZ4BlockInputStream will attempt to call refill() after it's reached the end of a compressed block (because end-of-block no longer implies end-of-stream). If we reach the end of the stream then refill()'s attempt to read the next block's magic header will fail and throw an EOFException, which is then caught and ignored.

Throwing and catching this exception has a significant performance penalty, especially in applications with deep call stacks: a significant amount of time is spent collecting exception stack traces. We could try to solve this problem by throwing a static exception, but that harms users' ability to debug unexpected exceptions.

This PR addresses this problem by adding a private tryReadFully() method, which is like readFully() except it signals success / failure by returning a boolean instead of throwing an exception. This allows us to skip the exception overhead in the case of "expected" exceptions, significantly improving performance.

Benchmarks

Here's a quick-and-dirty microbenchmark that I ran on my Mac (written in Scala):

import java.io._
import net.jpountz.lz4._

# Construct a test input consisting of two concatenated LZ4 streams
val testBytes = "Testing!".getBytes("UTF-8")
val bytes = new ByteArrayOutputStream()
val out = new LZ4BlockOutputStream(bytes)
out.write(testBytes)
out.close()
val concatInput = (bytes.toByteArray() ++ bytes.toByteArray())

// Measure repeated decompression speed
val start = System.currentTimeMillis()
var i = 0
while (i < 1000 * 1000) {
    i += 1
    val in = new LZ4BlockInputStream(new ByteArrayInputStream(concatInput), false)
    while (in.read() != -1) {}
    in.close()
}
val end = System.currentTimeMillis()
println(end - start)

This took ~5 seconds before and ~600ms after. This isn't the most scientific benchmark (no warmup), but I think it's a good illustration of the performance problems in the original code. Note that the perf. issues are even more pronounced with deep stacktraces (the stack is very shallow in this benchmark).

JoshRosen commented 5 years ago

/cc @maropu (who worked on the original implementation of this feature) and @kiszk (who recently upgraded lz4-java in Apache Spark)

kiszk commented 5 years ago

@JoshRosen Good catch, LGTM

@odaira WDYT? As we know, it looks too expensive to return the state to a caller by using an exception.

odaira commented 5 years ago

I haven't thoroughly checked the code, but your statement makes sense. I'll try the code myself.

maropu commented 5 years ago

I checked the code and it looks quite reasonable to me.

maropu commented 5 years ago

kindly ping

kiszk commented 5 years ago

@odaira gentle ping

odaira commented 5 years ago

Sorry, I have some urgent things, but I'll review this by the end of August.

odaira commented 5 years ago

Thanks for your contribution. I confirmed the microbenchmark was more than 2x faster with this change. Good optimization!

maropu commented 5 years ago

Thanks!

joshrosen-stripe commented 5 years ago

Hi @odaira,

If you have time, would it be possible to make a new release of lz4-java so Spark can pick up this performance optimization?

odaira commented 4 years ago

Yes, but I am working on a few more patches. The next release will be sometime in next month.

maropu commented 4 years ago

Thanks for the v1.7 release! @odaira