Closed JoshRosen closed 5 years ago
/cc @maropu (who worked on the original implementation of this feature) and @kiszk (who recently upgraded lz4-java
in Apache Spark)
@JoshRosen Good catch, LGTM
@odaira WDYT? As we know, it looks too expensive to return the state to a caller by using an exception.
I haven't thoroughly checked the code, but your statement makes sense. I'll try the code myself.
I checked the code and it looks quite reasonable to me.
kindly ping
@odaira gentle ping
Sorry, I have some urgent things, but I'll review this by the end of August.
Thanks for your contribution. I confirmed the microbenchmark was more than 2x faster with this change. Good optimization!
Thanks!
Hi @odaira,
If you have time, would it be possible to make a new release of lz4-java
so Spark can pick up this performance optimization?
Yes, but I am working on a few more patches. The next release will be sometime in next month.
Thanks for the v1.7 release! @odaira
This PR improves the performance of
LZ4BlockInputStream
whenstopOnEmptyBlock = false
, a mode used in Apache Spark (see #105 and #76 for background).When
stopOnEmptyBlock = false
,LZ4BlockInputStream
will attempt to callrefill()
after it's reached the end of a compressed block (because end-of-block no longer implies end-of-stream). If we reach the end of the stream thenrefill()
's attempt to read the next block's magic header will fail and throw anEOFException
, which is then caught and ignored.Throwing and catching this exception has a significant performance penalty, especially in applications with deep call stacks: a significant amount of time is spent collecting exception stack traces. We could try to solve this problem by throwing a static exception, but that harms users' ability to debug unexpected exceptions.
This PR addresses this problem by adding a private
tryReadFully()
method, which is likereadFully()
except it signals success / failure by returning a boolean instead of throwing an exception. This allows us to skip the exception overhead in the case of "expected" exceptions, significantly improving performance.Benchmarks
Here's a quick-and-dirty microbenchmark that I ran on my Mac (written in Scala):
This took ~5 seconds before and ~600ms after. This isn't the most scientific benchmark (no warmup), but I think it's a good illustration of the performance problems in the original code. Note that the perf. issues are even more pronounced with deep stacktraces (the stack is very shallow in this benchmark).