Open mtopolnik opened 9 years ago
Can I close this issue, as the discussion at lz4/lz4#126 ended two years ago?
The code I submitted here is thoroughly tested for both correctness and performance and it can be swapped into the project at will.
The changes I propose bring more perf to the Java Unsafe implementation. This fact is independent of the work on the native LZ4 project since the Unsafe implementation doesn't rely on it. We (Hazelcast) have no active interest in this at the moment since we decided not to use any compression (modern SSDs perform far too well for compression to provide any performance benefit).
So I'm fine with any decision you make.
I've studied the implementation of
wildIncrementalCopy
and found that it has a suboptimal approach to copying ranges which are close to each other (source and destination offset differ by 32 bytes or less). This manifests in bad decompression performance when input data has a short byte sequence which repeats many times.I have written an implementation of
wildIncrementalCopy
(specialized for little-endian architecture, but easily adaptable to big-endian) which takes a different approach to copying narrow ranges. I submit it below should there be any interest to make use of it.I have also slightly improved the usage of wild incremental copying such that the fallback to safe incremental copying happens only for the range where it's truly necessary:
With these changes my JMH measurements of the decompression of short-period repeating data show close to 4x speedup over the current Unsafe impl and 2x speedup over the native impl. Funnily though, the Safe impl fares very well on this test and is significantly faster than native, although not as fast as the one I'm providing.