oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.21k stars 1.62k forks source link

Parity with Hotspot compiler in a loop unrolling #1647

Open plokhotnyuk opened 5 years ago

plokhotnyuk commented 5 years ago

I have a couple of cases when Hotspot compiler do it better for JDK 8 than GraalVM 19.1.1.

Currently, I'm trying to mitigate it by unrolling some hottest cases manually: 1) https://github.com/plokhotnyuk/jsoniter-scala/blob/db689583dc35cd3dd457e9edb0b7fbcc10a5d508/jsoniter-scala-core/src/main/scala/com/github/plokhotnyuk/jsoniter_scala/core/JsonReader.scala#L2365-L2504 2) https://github.com/plokhotnyuk/jsoniter-scala/pull/364/files

It would be better to have a parity with Hotspot in automatic unrolling of loops to avoid code cluttering and maintenance burden.

thomaswue commented 5 years ago

Yes, this should certainly be automatic. I assume that you mean here "full unroll" where a loop body is unrolled completely due to constant iteration bound as opposed to "partial unroll" where just multiple subsequent iterations of a loop are merged into one, but the total iteration bound of the loop is unknown?

How much of a performance difference are you experiencing for unrolled vs non-unrolled? Those policies are sometimes sensitive as the too much unrolling can negatively impact the instruction cache performance.

plokhotnyuk commented 5 years ago

I mean "partial unroll" that usually done by 4x duplication of the loop body with additional copy for mini-loop that handles the remaining part.

Below are results for GraalVM CE/EE 19.2.0 for the 1st case of ASCII string parsing with and without manual unrolling that is switched on/off by the isGraalVM flag.

GraalVM CE 19.2.0 with manual unrolling (isGraalVM = true)

sbt -java-home /usr/lib/jvm/graalvm-ce-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                 (size)   Mode  Cnt         Score        Error  Units
[info] StringOfAsciiCharsReading.jsoniterScala        1  thrpt    5  52607141.605 ±  86287.948  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala       10  thrpt    5  31081447.078 ± 481894.323  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala      100  thrpt    5   8335755.034 ±  33083.995  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala     1000  thrpt    5    934160.199 ±   3848.913  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala    10000  thrpt    5    100635.218 ±    422.381  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala   100000  thrpt    5      9824.494 ±     39.988  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala  1000000  thrpt    5       886.779 ±      5.214  ops/s

GraalVM EE 19.2.0 with manual unrolling (isGraalVM = true)

sbt -java-home /usr/lib/jvm/graalvm-ee-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                 (size)   Mode  Cnt         Score         Error  Units
[info] StringOfAsciiCharsReading.jsoniterScala        1  thrpt    5  64737968.694 ± 1052304.912  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala       10  thrpt    5  38560441.600 ± 1932236.984  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala      100  thrpt    5  10736905.622 ±  541268.399  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala     1000  thrpt    5   1262840.467 ±    2130.942  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala    10000  thrpt    5    118401.576 ±    4178.023  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala   100000  thrpt    5     11760.347 ±     524.339  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala  1000000  thrpt    5       979.913 ±      43.457  ops/s

GraalVM CE 19.2.0 without manual unrolling (isGraalVM = false)

sbt -java-home /usr/lib/jvm/graalvm-ce-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                 (size)   Mode  Cnt         Score         Error  Units
[info] StringOfAsciiCharsReading.jsoniterScala        1  thrpt    5  52810124.159 ± 2332877.475  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala       10  thrpt    5  28600277.399 ± 1483439.788  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala      100  thrpt    5   6812044.543 ±  374698.138  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala     1000  thrpt    5    930820.669 ±   37439.626  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala    10000  thrpt    5     96474.200 ±    4195.969  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala   100000  thrpt    5      6520.250 ±     279.794  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala  1000000  thrpt    5       652.525 ±       5.579  ops/s

GraalVM EE 19.2.0 without manual unrolling (isGraalVM = false)

sbt -java-home /usr/lib/jvm/graalvm-ee-19 -no-colors 'jsoniter-scala-benchmark/jmh:run StringOfAsciiCharsReading.jsoniterScala'
...
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                 (size)   Mode  Cnt         Score         Error  Units
[info] StringOfAsciiCharsReading.jsoniterScala        1  thrpt    5  65911350.665 ± 2582704.919  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala       10  thrpt    5  38507732.430 ± 2084657.605  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala      100  thrpt    5   6859580.394 ±  352288.886  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala     1000  thrpt    5    768506.384 ±    1803.553  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala    10000  thrpt    5     75424.605 ±     339.976  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala   100000  thrpt    5      6637.629 ±     726.107  ops/s
[info] StringOfAsciiCharsReading.jsoniterScala  1000000  thrpt    5       603.647 ±      84.124  ops/s
thomaswue commented 5 years ago

OK, thank you so much for these interesting measurements. We will investigate asap. @tkrodriguez @gilles-duboscq