scala / scala3

The Scala 3 compiler, also known as Dotty.
https://dotty.epfl.ch
Apache License 2.0
5.83k stars 1.05k forks source link

Investigate changes to the trait encoding that could improve (cold) performance #5928

Open smarter opened 5 years ago

smarter commented 5 years ago

There's a lot of moving parts involved here so let me try to recap, note that this is all based on my own understanding which might be incomplete, feedback welcome (/cc @retronym @lrytz @adriaanm). I'm thinking that maybe we should turn this into a talk at the JVM Language Summit, maybe the JDK folks will take pity on us and fix some of this stuff.

1. Dotty implementation of super[MyTrait].foo for the JVM

A trait on the JVM is compiled down to an interface with default methods, in Scala we can write super[A].foo to call the default method foo implemented in A even if it's been overridden, in Dotty this compiles down to the following bytecode:

invokespecial A.foo

but this will only have the correct semantics if A is explicitly listed in the bytecode of the current class as a a super-interface, otherwise the wrong method might be called, here's an example:

trait A { def foo = "A" }
trait B extends A { override def foo = "B" }
class C extends A with B {
  def bar = super[A].foo
}

if we run javap on C.class, we'll see:

public class C implements A, B {

Because B implements A, the A part may seem redundant but in fact if we omit it then invokespecial A.foo will call the foo override in B !

2. Cold performance issue when doing default method resolution and listing all transitive parent interfaces

It turns out that listing all interfaces all can affect performance of default method resolution as investigated by @retronym in https://github.com/scala/scala-dev/issues/98#issuecomment-194602923, therefore it pays to avoid listing interfaces which aren't necessary to get the correct semantics in the generated bytecode, and in fact Dotty already does better than what @retronym reported in his commen, I have no clue what the code involved is and when it changed but now if I compile:

trait A
trait B extends A
class C extends B

I get:

public class C implements B {

A does get listed in C if we add a super[A].something call in it, as in the example in section 1.

So far so good right ? Well, things are about to get even more complicated.

3. Cold performance issue when doing default method resolution (even when parent interfaces are pruned)

As painstakingly investigated by the Scala 2 team, too much default method resolution leads to bad cold performance. This was worked around by emitting mixin forwarders unconditionally.

4. Mixin forwarders

Given:

trait A {
  def foo = 1
}
trait B { self: A =>
  override def foo = 2
}
class C extends A with B
object Test {
  (new C).foo
}

The semantics of Scala requires that the call to foo on C ends up calling foo in B, but since B and A are unrelated from the point of view of the JVM and both have a foo, you'll get an IncompatibleClassChangeError unless an explicit forwarder is added to C:

class C extends A with B {
  override def foo = super[B].foo
}

Both Scala 2 and Dotty add these forwarders automatically for correctness. But Scala 2 adds them even when they're not strictly needed to avoid doing default method resolution when possible for (cold, maybe hot too ?) performance, eventually Dotty should do the same: https://github.com/lampepfl/dotty/issues/2563.

5. Scala 2 implementation of super[MyTrait].foo for the JVM

But wait, we need mixin forwarders to improve cold performance, but mixin forwarders are implemented using super-calls, and we just saw that super-calls require listing more interfaces, and listing more interfaces worsens cold performance 🤯. Can we do anything about this ? Scala 2 did by going back to the drawing board and changing the scheme for doing super-calls, let's look at the example from section 1 again:

trait A { def foo = "A" }
trait B extends A { override def foo = "B" }
class C extends A with B {
  def bar = super[A].foo
}

In Scala 2 this becomes when decompiled with cfr and slightly simplified:

public interface A {
  public static /* synthetic */ String foo$(A $this) {
    return $this.foo();
  }
  default public String foo() {
    return "A";
  }
}
public interface B extends A {
   public static /* synthetic */ String foo$(B $this) {
    return $this.foo();
  }
  @Override default public String foo() {
    return "B";
  }
}
public class C implements B {
  @Override public String foo() {
    return B.foo$(this);
  }
   public String bar() {
    return A.foo$(this)
  }
}

There's a few important things to notice here:

So Scala 2 trades off listing some interface parents against adding static methods everywhere, see https://github.com/scala/scala/pull/5251#issuecomment-463900334 for more details.

6. OK but what do we do now ?

The good news is that as far as I can tell, the differences between Scala 2 and Dotty in trait encoding don't affect semantics, this is purely about performance so no urgent action is needed.

6.1. Benchmarking

The first thing we should do to investigate this is setup some benchmarking infrastructure to reason about hot and cold performance of the code generated by the compiler. In Scala 2 the benchmarking involved in making the decisions that lead to their particular encoding was (I think, correct me if I'm wrong) mostly based on https://github.com/scala/compiler-benchmark which benchmarks the performance of the compiler, this doesn't really work for Dotty currently because we don't compile scala-library and instead use the own compiled by Scala 2, and most of the impact of default method resolution on performance comes from the deep collection hierarchy, so we need to:

  1. Be able to compile and run scala-library reliably (compiling it is part of our community build but we need to check that it actually works)
  2. Benchmark the compiler with a Dotty-compiled scala-library
  3. Benchmark cold and hot performance of the compiler (https://github.com/scala/compiler-benchmark has infrastructure for this)
  4. Benchmark everything in this quadrant (A is what Dotty does currently EDIT: Dotty has now switched to B, D is what Scala 2 does):

    emit mixin forwarders when needed always emit mixin forwarders
    super-call to trait with invokespecial A B
    super-call to trait with invokestatic C D
  5. Benchmark with a recent JDK, it looks like things might be improving finally

6.2. Possible implementation improvements

I think the most promising corner of the quadrant is B: we need logic to generate mixin forwarders anyway for correctness, so generating them all the time is not a big deal, by contrast adding a bunch o f static methods is extra complexity I'd rather avoid.

6.2.1. Avoid listing unnecessary extra interfaces

Mixin forwarders require extra super-calls on traits, but as @lrytz observed in https://github.com/scala/scala-dev/issues/228, many super-calls don't actually require listing extra interfaces to work correctly and that reasoning seems to apply to mixin forwarders, from the issue:

trait T { def f = 1 }
trait U extends T
class C extends U { def t = super.f }

Here, emitting invokespecial U.f is correct, we don't need to add T as a direct parent of C.

And yet, if we look at the code generated by Dotty, it's not smart enough to avoid listing T:

public class C implements T, U {
   public int t() {
    return T.super.f();
  }
}

6.2.2. Emit trait constructors statically

In both Scala 2 and Dotty, the constructor of a trait is emitted as a $init$ method, but in Scala 2 it's a static method. Aligning ourselves with Scala 2 here would be an easy win because all transitive children of a trait with a constructor are required to call it, so in Dotty we end up with mandatory super-calls that force us to list transitive interfaces just to call $init$. Since $init$ is special anyway, making it static doesn't seem like a big deal.

7. See also

Here are various issues and PRs more or less related, this might not be helpful but at least they'll all have a backlink to this issue now:

lrytz commented 5 years ago

Your understanding of the situation seems right to me, as far as I remember :-) I think we kept the static methods in interfaces because there's still a risk for performance degradation due to adding redundant direct interfaces in complicated hierarchies like collections. I think there was also an argument about binary compatibility for keeping the static accessors, but I don't remember what situation was being considered.