tukaani-project / xz-java

XZ for Java
https://tukaani.org/xz/java.html
BSD Zero Clause License
23 stars 14 forks source link

delta coding performance #11

Closed bokken closed 6 months ago

bokken commented 7 months ago

Pull request checklist

Improves the performance of delta coding, as covered on the mailing list:

https://www.mail-archive.com/xz-devel@tukaani.org/msg00503.html

Unit tests are included in https://github.com/tukaani-project/xz-java/pull/10

Please check if your PR fulfills the following requirements:

Pull request type

Please check the type of change your PR introduces: - [ ] Bugfix - [ ] Feature - [ ] Code style update (formatting, renaming, typo fix) - [x] Refactoring (no functional changes, no api changes) - [ ] Build related changes - [ ] Documentation content changes - [ ] Other (please describe): ## What is the current behavior?

Related Issue URL:

What is the new behavior?

-

-

Does this introduce a breaking change?

Other information

Larhzu commented 6 months ago

I've been preparing a few unrelated changes and including this Delta filter improvement among those. I hope the changes get to master tomorrow. (This is a special case and not the normal way in the project. Sorry.)

I benchmarked the Delta filter changes. Encoding time was reduced by 60-70 % when distance=1 and over 50 % with long distances. That makes sense as with longer distances more bytes go via the history buffer.

Decoding time reduced up to 50 % with long distances but only 5 % with distance=1. It's still a clear improvement. I suppose short distance is slower because buf[off + i] += buf[off + i - distance]; has dependency on the previous iteration; it cannot parallelize like with longer distances.

I wonder if JVM is smart enough to vectorize the decoder loops when distance is large enough and appropriate multiple of a power of two. For example, distance=240 is a little faster than 248.

In any case, thanks for the improvement and sorry that it has taken three years to get it included.

bokken commented 6 months ago

I wonder if JVM is smart enough to vectorize the decoder loops when distance is large enough and appropriate multiple of a power of two. For example, distance=240 is a little faster than 248.

It might be. I also used the DeltaEncoder and DeltaDecoder as a means of playing with the Vector api functionality in https://openjdk.java.net/jeps/338, which still has not been fully released. I definitely recall on the encoder side seeing further improvements beyond what was in this PR. (It actually worked the opposite direction, I observed something like 50:1 improvement in performance, on hardware where 32:1 should have been theoretical max. This caused me to revisit the original implementation to see if it could be improved without use of vectors.) While I also worked on the decoder, I do not remember actual performance numbers.

In any case, thanks for the improvement and sorry that it has taken three years to get it included.

Just glad to see it making it :)