tukaani-project / xz-java

XZ for Java
https://tukaani.org/xz/java.html
BSD Zero Clause License
23 stars 14 forks source link

XZ compression improvements by optimizing array pattern matching #13

Closed bokken closed 5 months ago

bokken commented 6 months ago

This is incremental changes to simplify code review process. This first set of changes introduces the array mismatch abstraction and initial changes using Unsafe to read unaligned longs for x86 and aarch64.

This is similar to https://github.com/tukaani-project/xz-java/pull/12, but presents a smaller set of changes to simplify review.

Pull request checklist

Please check if your PR fulfills the following requirements:

Pull request type

Please check the type of change your PR introduces: - [ ] Bugfix - [ ] Feature - [ ] Code style update (formatting, renaming, typo fix) - [ ] Refactoring (no functional changes, no api changes) - [ ] Build related changes - [ ] Documentation content changes - [ ] Other (please describe): ## What is the current behavior?

Related Issue URL:

What is the new behavior?

-

-

Does this introduce a breaking change?

Other information

bokken commented 6 months ago

These benchmark results are from windows 11 13th Gen Intel(R) Core(TM) i7-1370P . The 3 different files are: ihe_ovly_pr.dcm is a acr/nema dicom format presentation state - sized ~66KB. image1.dcm is a acr/nema dicom formated uncompressed image sized ~26MB. large.xml is an xml file sized ~52MB.

They show regression of up to 10% for the "legacy"/compatibility case. They show performance improvements ranging from ~0% -> ~30% depending on the content and preset level. In general, preset 6 saw larger improvements than preset 3.

jdk 8

Benchmark                                             (file)  (preset)  Mode  Cnt     Score      Error  Units
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         3  avgt    3     0.598 ±    0.065  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         3  avgt    3     0.650 ±    0.195  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         3  avgt    3     0.545 ±    0.097  ms/op
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         6  avgt    3     2.848 ±    0.401  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         6  avgt    3     2.725 ±    0.242  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         6  avgt    3     1.866 ±    0.451  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         3  avgt    3  2036.300 ±   82.683  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         3  avgt    3  2136.215 ± 1500.816  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         3  avgt    3  1929.697 ±  678.446  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         6  avgt    3  4124.933 ±  773.943  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         6  avgt    3  4333.171 ±  576.123  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         6  avgt    3  3692.338 ±  162.947  ms/op
XZCompressionBenchmark.baseline                    large.xml         3  avgt    3   760.670 ±  147.354  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         3  avgt    3   787.818 ±  127.012  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         3  avgt    3   694.098 ±   63.862  ms/op
XZCompressionBenchmark.baseline                    large.xml         6  avgt    3  6601.508 ± 1201.524  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         6  avgt    3  7352.783 ± 1461.493  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         6  avgt    3  5279.845 ±  834.122  ms/op

jdk 11

Benchmark                                             (file)  (preset)  Mode  Cnt      Score           Error  Units
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         3  avgt    3       0.590 ±        0.136  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         3  avgt    3       0.587 ±        0.106  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         3  avgt    3       0.542 ±        0.076  ms/op
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         6  avgt    3       2.914 ±        1.259  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         6  avgt    3       2.760 ±        0.254  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         6  avgt    3       1.933 ±        0.157  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         3  avgt    3    2113.911 ±      815.591  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         3  avgt    3    2197.521 ±     1398.848  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         3  avgt    3    2084.317 ±      904.885  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         6  avgt    3    4088.532 ±      847.937  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         6  avgt    3    4137.546 ±       88.137  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         6  avgt    3    3991.293 ±     1587.101  ms/op
XZCompressionBenchmark.baseline                    large.xml         3  avgt    3     699.659 ±      270.399  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         3  avgt    3     707.156 ±      165.419  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         3  avgt    3     665.311 ±       36.397  ms/op
XZCompressionBenchmark.baseline                    large.xml         6  avgt    3    6901.331 ±     9032.103  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         6  avgt    3    7338.866 ±     4002.768  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         6  avgt    3    5932.740 ±      383.178  ms/op

jdk 17

Benchmark                                             (file)  (preset)  Mode  Cnt     Score      Error  Units
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         3  avgt    3     0.598 ±    0.029  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         3  avgt    3     0.606 ±    0.054  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         3  avgt    3     0.553 ±    0.247  ms/op
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         6  avgt    3     2.836 ±    0.701  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         6  avgt    3     2.820 ±    0.179  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         6  avgt    3     1.959 ±    0.431  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         3  avgt    3  2116.691 ±  743.495  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         3  avgt    3  2268.474 ±  667.310  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         3  avgt    3  2150.507 ± 1549.747  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         6  avgt    3  4144.431 ± 1380.345  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         6  avgt    3  4481.542 ± 1059.997  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         6  avgt    3  4044.453 ±  461.817  ms/op
XZCompressionBenchmark.baseline                    large.xml         3  avgt    3   739.495 ±  166.726  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         3  avgt    3   734.365 ±  137.722  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         3  avgt    3   684.531 ±   89.739  ms/op
XZCompressionBenchmark.baseline                    large.xml         6  avgt    3  6847.095 ±  904.741  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         6  avgt    3  7686.943 ± 2843.518  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         6  avgt    3  5957.280 ±  338.260  ms/op

jdk 21

Benchmark                                             (file)  (preset)  Mode  Cnt     Score      Error  Units
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         3  avgt    3     0.589 ┬▒    0.064  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         3  avgt    3     0.578 ┬▒    0.264  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         3  avgt    3     0.543 ┬▒    0.068  ms/op
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         6  avgt    3     2.842 ┬▒    2.348  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         6  avgt    3     2.637 ┬▒    0.621  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         6  avgt    3     1.860 ┬▒    0.137  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         3  avgt    3  2013.813 ┬▒ 2111.947  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         3  avgt    3  2022.160 ┬▒  244.838  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         3  avgt    3  2079.494 ┬▒  559.869  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         6  avgt    3  4200.710 ┬▒  559.429  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         6  avgt    3  4318.551 ┬▒  224.317  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         6  avgt    3  4000.231 ┬▒  122.133  ms/op
XZCompressionBenchmark.baseline                    large.xml         3  avgt    3   741.106 ┬▒  304.118  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         3  avgt    3   762.184 ┬▒  193.266  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         3  avgt    3   707.872 ┬▒  217.096  ms/op
XZCompressionBenchmark.baseline                    large.xml         6  avgt    3  6644.706 ┬▒   71.733  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         6  avgt    3  7490.776 ┬▒ 1352.132  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         6  avgt    3  5125.599 ┬▒  723.249  ms/op
bokken commented 6 months ago

Having updated the code to both appropriately handle the niceLenLimit in BT4 and adding the bounds check, here are updated benchmarks. This is still Windows 11, but this time an older i7-9850H. I will continue to work on getting results from the 13th Gen Intel(R) Core(TM) i7-1370P listed above.

The benefits still remain, though perhaps a bit more subdued.

JDK 8

Benchmark                                             (file)  (preset)  Mode  Cnt      Score      Error  Units
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         3  avgt    3      1.050 ±    1.455  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         3  avgt    3      0.990 ±    0.282  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         3  avgt    3      0.909 ±    0.033  ms/op
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         6  avgt    3      5.030 ±    0.966  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         6  avgt    3      4.943 ±    0.998  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         6  avgt    3      3.565 ±    0.323  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         3  avgt    3   2488.136 ±  362.787  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         3  avgt    3   2737.924 ± 3097.448  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         3  avgt    3   2519.283 ±  269.135  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         6  avgt    3   5622.677 ±  611.597  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         6  avgt    3   5843.140 ±  712.834  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         6  avgt    3   5167.348 ±  159.264  ms/op
XZCompressionBenchmark.baseline                    large.xml         3  avgt    3   1015.869 ±   31.331  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         3  avgt    3   1043.607 ±   64.194  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         3  avgt    3    985.266 ±  135.448  ms/op
XZCompressionBenchmark.baseline                    large.xml         6  avgt    3   8714.679 ±  375.658  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         6  avgt    3  10698.865 ±  661.517  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         6  avgt    3   8009.948 ±  766.887  ms/op

JDK 11

Benchmark                                             (file)  (preset)  Mode  Cnt      Score      Error  Units
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         3  avgt    3      0.958 ±    0.135  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         3  avgt    3      0.995 ±    0.413  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         3  avgt    3      0.872 ±    0.287  ms/op
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         6  avgt    3      4.946 ±    0.695  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         6  avgt    3      4.737 ±    0.719  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         6  avgt    3      3.078 ±    0.142  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         3  avgt    3   2573.713 ±  884.641  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         3  avgt    3   2601.671 ±   80.804  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         3  avgt    3   2538.038 ±   91.440  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         6  avgt    3   5735.404 ±  224.716  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         6  avgt    3   5943.590 ± 1239.205  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         6  avgt    3   5039.090 ±  514.206  ms/op
XZCompressionBenchmark.baseline                    large.xml         3  avgt    3   1007.470 ±  101.786  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         3  avgt    3   1025.208 ±   53.206  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         3  avgt    3    948.410 ±   32.021  ms/op
XZCompressionBenchmark.baseline                    large.xml         6  avgt    3   8843.118 ± 1184.551  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         6  avgt    3  10581.770 ±   57.032  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         6  avgt    3   8495.813 ± 1967.609  ms/op

JDK 21

Benchmark                                             (file)  (preset)  Mode  Cnt     Score       Error  Units
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         3  avgt    3     1.089 ┬▒    0.521  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         3  avgt    3     0.861 ┬▒    0.304  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         3  avgt    3     0.804 ┬▒    0.055  ms/op
XZCompressionBenchmark.baseline              ihe_ovly_pr.dcm         6  avgt    3     4.712 ┬▒    0.703  ms/op
XZCompressionBenchmark.compress_legacy       ihe_ovly_pr.dcm         6  avgt    3     4.286 ┬▒    0.793  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm         6  avgt    3     3.270 ┬▒    6.553  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         3  avgt    3  3969.325 ┬▒  812.895  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         3  avgt    3  2543.389 ┬▒  938.026  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         3  avgt    3  2539.038 ┬▒  703.522  ms/op
XZCompressionBenchmark.baseline                   image1.dcm         6  avgt    3  6383.656 ┬▒  141.257  ms/op
XZCompressionBenchmark.compress_legacy            image1.dcm         6  avgt    3  5604.145 ┬▒  428.556  ms/op
XZCompressionBenchmark.compress_unsafe_long       image1.dcm         6  avgt    3  5108.071 ┬▒  357.189  ms/op
XZCompressionBenchmark.baseline                    large.xml         3  avgt    3  1211.232 ┬▒  425.227  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         3  avgt    3  1016.582 ┬▒   61.477  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         3  avgt    3   967.770 ┬▒  267.865  ms/op
XZCompressionBenchmark.baseline                    large.xml         6  avgt    3  9171.859 ┬▒ 1681.746  ms/op
XZCompressionBenchmark.compress_legacy             large.xml         6  avgt    3  9864.887 ┬▒  970.370  ms/op
XZCompressionBenchmark.compress_unsafe_long        large.xml         6  avgt    3  7410.449 ┬▒  775.903  ms/op
ecki commented 5 months ago

Are you also planning a VarHandles version which doesn’t need the Unsafe method - I think it should apply for this case.