Closed lrytz closed 5 years ago
Here's the "Richards" benchmark written using JMH: https://github.com/lrytz/benchmarks/blob/master/src/main/scala/misc/Richards.scala
On my machine:
2.11.7
[info] Richards.run avgt 10 0.113 ± 0.001 ms/op
2.11.7, -optimise
[info] Richards.run avgt 10 0.113 ± 0.002 ms/op
2.11.7, -Ybackend:GenBCode
[info] Richards.run avgt 10 0.126 ± 0.003 ms/op
2.12.0-newopt
[info] Richards.run avgt 10 0.126 ± 0.004 ms/op
2.12.0-newopt, -Yopt:l:classpath
[info] Richards.run avgt 10 0.113 ± 0.003 ms/op
On my slow linux box (Celeron N3050):
2.11.7
[info] Richards.run avgt 10 0.437 ± 0.003 ms/op
2.11.7, -optimise
[info] Richards.run avgt 10 0.439 ± 0.008 ms/op
2.11.7, -Ybackend:GenBCode
[info] Richards.run avgt 10 0.427 ± 0.003 ms/op
2.12.0-newopt
[info] Richards.run avgt 10 0.431 ± 0.006 ms/op
2.12.0-newopt, , -Yopt:l:classpath
[info] Richards.run avgt 10 0.436 ± 0.005 ms/op
Observations
I looked at the bytecode produced by 2.11.7 with GenASM and GenBCode. It's in this repo: https://github.com/lrytz/richardsBenchBytecode/commits/master.
The differences are
IFNONNULL
vs IFNULL
), leading to a few more jumping instructions in GenBCodeACONST_NULL; POP
sequences in GenBcodeEnabling the new optimizer cleans up the jumps and removes the additional ones. This seems bring the performance back to GenASM level on my machine: 2.12.0-newopt-Yopt:l:classpath has the same speed as 2.11.7-GenASM. Again, on the linux box, we don't see any of that.
Sudoku
My machine:
2.11.7
[info] Sudoku.run avgt 10 1.103 ± 0.014 ms/op
2.11.7, -Ybackend:GenBCode
[info] Sudoku.run avgt 10 1.113 ± 0.014 ms/op
2.11.7, -optimise
[info] Sudoku.run avgt 10 1.101 ± 0.019 ms/op
2.12.0-newopt
[info] Sudoku.run avgt 10 1.120 ± 0.010 ms/op
2.12.0-newopt, -Yopt:l:classpath
[info] Sudoku.run avgt 10 1.120 ± 0.013 ms/op
Linux box:
2.11.7
[info] Sudoku.run avgt 10 3.808 ± 0.295 ms/op
2.11.7, -Ybackend:GenBCode
[info] Sudoku.run avgt 10 3.999 ± 0.298 ms/op
2.11.7, -optimise
[info] Sudoku.run avgt 10 3.902 ± 0.211 ms/op
2.12.0-newopt
[info] Sudoku.run avgt 10 3.919 ± 0.207 ms/op
2.12.0-newopt, -Yopt:l:classpath
[info] Sudoku.run avgt 10 3.929 ± 0.266 ms/op
the situation looks very similar to "richards"
Some more findings
if
, resulting in a WithFilter.map
call, which is a bit slow. See here https://github.com/lrytz/benchmarks/blob/4d43af04c060285e2e717a7b00262ac4d921a3f5/src/main/scala/misc/Sudoku.scala#L125.@lrytz this seems stale, should it stay open?
While looking at some benchmarks, I found that Sudoku (taken from https://github.com/jonas/scala-js-benchmarks) runs slower on 2.12 than 2.11 (with no optimizers enabled): 1500 vs 1300.
also look at other benchmarks. for example, try to find out why the scala-js optimizer makes "Richards" 3x faster (https://youtu.be/IvB1APFZK5Q?t=4m2s), while the 2.11 and 2.12 optimizers don't change anything -- is the scala-js optimizer doing things that help only on th JS-VMs, or would they also improve perf on the JVM?
TODO