Open limansky opened 4 months ago
Simple programs like that are not remotely good enough to reliably test performance on the JVM. Please use JMH when attempting to compare run-time performance on the JVM.
actually the answer is very clear, Scala 2.13 specialised foreach
to foreach$mVc$sp
, but Scala 3 does not
looking at cfr-decompiler output difference
long s2 = System.nanoTime();
- RichInt$.MODULE$.until$extension(Predef$.MODULE$.intWrapper(0), M).foreach$mVc$sp((Function1)(JFunction1.mcVI.sp & Serializable)x$1 -> {
+ RichInt$.MODULE$.until$extension(Predef$.MODULE$.intWrapper(0), M).foreach((Function1)(JFunction1.mcVI.sp & Serializable)_$1 -> {
for (int i = 0; i < N; ++i) {
sum$1.elem += arr[i];
}
Range.foreach
is specialized for Int => Unit
, but it seems Scala 3 doesn't invoke foreach$mVc$sp
, so every int is boxed when calling the argument function.
(jinx..)
Thanks, @bishabosha
Replacing foreach
with while
makes performance the same. This is kind of counter-intuitive for me, because foreach is used for the outer loop, which has small number of iterations. Could you please explain why boxing of the outer loop variable has such significant performance impact in this case?
actually the clearer answer is that Scala 3 doesn't do DelayedInit
anymore, so the example is compiled to an ordinary module which is not optimized i.e. jitted on the JVM because it is static initialization. (This is the old REPL bugaboo.)
Showing original in-module (the b version) vs factored out to instance method:
➜ snips scala i19759.scala
time 4148 ms
10000000000
➜ snips ~/projects/dotty/bin/scala i19759.scala
time 4118 ms
10000000000
➜ snips scala i19759b.scala
time 4184 ms
10000000000
➜ snips ~/projects/dotty/bin/scala i19759b.scala
time 70905 ms
10000000000
The original is the b
version because when pasting I automatically moved it from App
to regular class without thinking. That is what coding trauma does to the brain.
I must have mixed things up yesterday, i thought I could reproduce it with a main
method. But either way
foreach
is the same as with foreach$mVc$sp
, that's thanks to the JVM inlining and eliminating the boxing. The int argument is unused here, so it's conceivable that the optimization might not always succeed in more complicated cases.So it still seems worth fixing.
I removed the regression
label because DelayedInit is explicitly a removed feature, and renamed to reflect that we should focus on fixing the specialisation
I have asked a question in https://contributors.scala-lang.org/t/status-of-specialization-in-scala-3/4628 and my impression from the response was there is no support at all for specialization in Scala 3.
I know Scala 3 certainly cannot produce specialized code. Is it able to consume specialized functions created in Scala 2, or how can this issue be fixed?
@OndrejSpanel Yes, it should be able to consume specialized functions created in Scala 2. So it seems there's an issue with that specialization that needs fixing.
Compiler version
I've compared a simple peace of code performance running with Scala 2 and Scala 3 and faced with performance regression. In this case Scala 2 generated program is more than 7 times faster on my computer. The code is just calculate sum of big Array[Long] several times.
Minimized code
The whole project code is available here: https://github.com/limansky/perf-issue
Output
Expectation
I suppose Scala 3 should be at least as fast as Scala 2.