Range.foreach is not specialised

limansky commented 4 months ago

Compiler version

I've compared a simple peace of code performance running with Scala 2 and Scala 3 and faced with performance regression. In this case Scala 2 generated program is more than 7 times faster on my computer. The code is just calculate sum of big Array[Long] several times.

Minimized code

    (0 until M).foreach { _ =>
      var i = 0
      while (i < N) {
        sum += arr(i)
        i += 1
      }
    }

The whole project code is available here: https://github.com/limansky/perf-issue

Output

sbt:perf> +run
[info] Setting Scala version to 2.13.12 on 1 projects.
[info] Reapplying settings...
[info] set current project to perf (in build file:/home/limansky/projects/langperf/scala/)
[info] running Main 
time 4268 ms
10000000000
[success] Total time: 5 s, completed Feb 22, 2024, 3:10:32 PM
[info] Setting Scala version to 3.3.1 on 1 projects.
[info] Reapplying settings...
[info] set current project to perf (in build file:/home/limansky/projects/langperf/scala/)
[info] running Main 
time 31699 ms
10000000000

Expectation

I suppose Scala 3 should be at least as fast as Scala 2.

sjrd commented 4 months ago

Simple programs like that are not remotely good enough to reliably test performance on the JVM. Please use JMH when attempting to compare run-time performance on the JVM.

bishabosha commented 4 months ago

actually the answer is very clear, Scala 2.13 specialised foreach to foreach$mVc$sp, but Scala 3 does not

lrytz commented 4 months ago

looking at cfr-decompiler output difference

         long s2 = System.nanoTime();
-        RichInt$.MODULE$.until$extension(Predef$.MODULE$.intWrapper(0), M).foreach$mVc$sp((Function1)(JFunction1.mcVI.sp & Serializable)x$1 -> {
+        RichInt$.MODULE$.until$extension(Predef$.MODULE$.intWrapper(0), M).foreach((Function1)(JFunction1.mcVI.sp & Serializable)_$1 -> {
             for (int i = 0; i < N; ++i) {
                 sum$1.elem += arr[i];
             }

Range.foreach is specialized for Int => Unit, but it seems Scala 3 doesn't invoke foreach$mVc$sp, so every int is boxed when calling the argument function.

lrytz commented 4 months ago

(jinx..)

limansky commented 4 months ago

Thanks, @bishabosha Replacing foreach with while makes performance the same. This is kind of counter-intuitive for me, because foreach is used for the outer loop, which has small number of iterations. Could you please explain why boxing of the outer loop variable has such significant performance impact in this case?

som-snytt commented 4 months ago

actually the clearer answer is that Scala 3 doesn't do DelayedInit anymore, so the example is compiled to an ordinary module which is not optimized i.e. jitted on the JVM because it is static initialization. (This is the old REPL bugaboo.)

Showing original in-module (the b version) vs factored out to instance method:

➜  snips scala i19759.scala
time 4148 ms
10000000000
➜  snips ~/projects/dotty/bin/scala i19759.scala
time 4118 ms
10000000000
➜  snips scala i19759b.scala
time 4184 ms
10000000000
➜  snips ~/projects/dotty/bin/scala i19759b.scala
time 70905 ms
10000000000

The original is the b version because when pasting I automatically moved it from App to regular class without thinking. That is what coding trauma does to the brain.

lrytz commented 4 months ago

I must have mixed things up yesterday, i thought I could reproduce it with a main method. But either way

@sjrd is right that this is not a reliable benchmark
if the performance of using foreach is the same as with foreach$mVc$sp, that's thanks to the JVM inlining and eliminating the boxing. The int argument is unused here, so it's conceivable that the optimization might not always succeed in more complicated cases.

So it still seems worth fixing.

bishabosha commented 4 months ago

I removed the regression label because DelayedInit is explicitly a removed feature, and renamed to reflect that we should focus on fixing the specialisation

OndrejSpanel commented 4 months ago

I have asked a question in https://contributors.scala-lang.org/t/status-of-specialization-in-scala-3/4628 and my impression from the response was there is no support at all for specialization in Scala 3.

I know Scala 3 certainly cannot produce specialized code. Is it able to consume specialized functions created in Scala 2, or how can this issue be fixed?

odersky commented 4 months ago

@OndrejSpanel Yes, it should be able to consume specialized functions created in Scala 2. So it seems there's an issue with that specialization that needs fixing.

scala / scala3