Open plokhotnyuk opened 5 years ago
Yes, this should certainly be automatic. I assume that you mean here "full unroll" where a loop body is unrolled completely due to constant iteration bound as opposed to "partial unroll" where just multiple subsequent iterations of a loop are merged into one, but the total iteration bound of the loop is unknown?
How much of a performance difference are you experiencing for unrolled vs non-unrolled? Those policies are sometimes sensitive as the too much unrolling can negatively impact the instruction cache performance.
I mean "partial unroll" that usually done by 4x duplication of the loop body with additional copy for mini-loop that handles the remaining part.
Below are results for GraalVM CE/EE 19.2.0 for the 1st case of ASCII string parsing with and without manual unrolling that is switched on/off by the isGraalVM
flag.
sbt -java-home /usr/lib/jvm/graalvm-ce-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfAsciiCharsReading.jsoniterScala 1 thrpt 5 52607141.605 ± 86287.948 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 10 thrpt 5 31081447.078 ± 481894.323 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 100 thrpt 5 8335755.034 ± 33083.995 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 1000 thrpt 5 934160.199 ± 3848.913 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 10000 thrpt 5 100635.218 ± 422.381 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 100000 thrpt 5 9824.494 ± 39.988 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 1000000 thrpt 5 886.779 ± 5.214 ops/s
sbt -java-home /usr/lib/jvm/graalvm-ee-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfAsciiCharsReading.jsoniterScala 1 thrpt 5 64737968.694 ± 1052304.912 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 10 thrpt 5 38560441.600 ± 1932236.984 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 100 thrpt 5 10736905.622 ± 541268.399 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 1000 thrpt 5 1262840.467 ± 2130.942 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 10000 thrpt 5 118401.576 ± 4178.023 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 100000 thrpt 5 11760.347 ± 524.339 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 1000000 thrpt 5 979.913 ± 43.457 ops/s
sbt -java-home /usr/lib/jvm/graalvm-ce-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfAsciiCharsReading.jsoniterScala 1 thrpt 5 52810124.159 ± 2332877.475 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 10 thrpt 5 28600277.399 ± 1483439.788 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 100 thrpt 5 6812044.543 ± 374698.138 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 1000 thrpt 5 930820.669 ± 37439.626 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 10000 thrpt 5 96474.200 ± 4195.969 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 100000 thrpt 5 6520.250 ± 279.794 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 1000000 thrpt 5 652.525 ± 5.579 ops/s
sbt -java-home /usr/lib/jvm/graalvm-ee-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfAsciiCharsReading.jsoniterScala 1 thrpt 5 65911350.665 ± 2582704.919 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 10 thrpt 5 38507732.430 ± 2084657.605 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 100 thrpt 5 6859580.394 ± 352288.886 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 1000 thrpt 5 768506.384 ± 1803.553 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 10000 thrpt 5 75424.605 ± 339.976 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 100000 thrpt 5 6637.629 ± 726.107 ops/s
[info] StringOfAsciiCharsReading.jsoniterScala 1000000 thrpt 5 603.647 ± 84.124 ops/s
OK, thank you so much for these interesting measurements. We will investigate asap. @tkrodriguez @gilles-duboscq
I have a couple of cases when Hotspot compiler do it better for JDK 8 than GraalVM 19.1.1.
Currently, I'm trying to mitigate it by unrolling some hottest cases manually: 1) https://github.com/plokhotnyuk/jsoniter-scala/blob/db689583dc35cd3dd457e9edb0b7fbcc10a5d508/jsoniter-scala-core/src/main/scala/com/github/plokhotnyuk/jsoniter_scala/core/JsonReader.scala#L2365-L2504 2) https://github.com/plokhotnyuk/jsoniter-scala/pull/364/files
It would be better to have a parity with Hotspot in automatic unrolling of loops to avoid code cluttering and maintenance burden.