tukaani-project / xz-java

XZ for Java
https://tukaani.org/xz/java.html
BSD Zero Clause License
23 stars 14 forks source link

improve byte array mismatch calculation performance #12

Closed bokken closed 5 months ago

bokken commented 7 months ago

Much of the compression work is comparing ranges of binary values. These changes improve the performance for every platform on jdk9+.

For x86 this is done by using a byteArrayViewVarHandle to process 8 (64 bit) or 4 bytes at a time.

https://docs.oracle.com/javase%2F9%2Fdocs%2Fapi%2F%2F/java/lang/invoke/MethodHandles.html#byteArrayViewVarHandle-java.lang.Class-java.nio.ByteOrder-

For all other platforms, this uses the Arrays.mismatch method to do the comparison, which is implemented in the jdk itself with vectors.

https://docs.oracle.com/javase%2F9%2Fdocs%2Fapi%2F%2F/java/util/Arrays.html#mismatch-byte:A-int-int-byte:A-int-int-

For older jdks (7 and 8) on x86 or aarch64, sun.misc.Unsafe is used (if it can be found) to process 8 (64 bit) or 4 bytes at a time.

As previously discussed on the mailing list, this shows substantial improvements (~20%) in total compression times for a variety of real-world data sets for both arm and x86, with only minor regressions if forced (by system property) to legacy byte-by-byte comparisons. https://www.mail-archive.com/xz-devel@tukaani.org/msg00400.html

I have additional unit tests specific to the ArrayMismatch implementations, which can be added on top of https://github.com/tukaani-project/xz-java/pull/10

Pull request checklist

Please check if your PR fulfills the following requirements:

Pull request type

Please check the type of change your PR introduces: - [ ] Bugfix - [ ] Feature - [ ] Code style update (formatting, renaming, typo fix) - [ ] Refactoring (no functional changes, no api changes) - [ ] Build related changes - [ ] Documentation content changes - [ ] Other (please describe): ## What is the current behavior?

Related Issue URL:

What is the new behavior?

-

-

Does this introduce a breaking change?

Other information