scala / scala-dev

Scala 2 team issues. Not for user-facing bugs or directly actionable user-facing improvements. For build/test/infra and for longer-term planning and idea tracking. Our bug tracker is at https://github.com/scala/bug/issues
Apache License 2.0
130 stars 15 forks source link

Startup regression due to HotSpot JVM's population of vtables with default methods #243

Open retronym opened 7 years ago

retronym commented 7 years ago

Running scalac -version in 2.12.0-SNAPSHOT takes 0.9s, a significant regression from 2.11.8, which took 0.2s.

% time scalac -debug -version

/Library/Java/JavaVirtualMachines/jbsdk8u112b403_osx_x64/Contents/Home/bin/java
-Xmx256M
-Xms32M
-Xbootclasspath/a:/Users/jason/scala/2.11.8/lib/akka-actor_2.11-2.3.10.jar:/Users/jason/scala/2.11.8/lib/config-1.2.1.jar:/Users/jason/scala/2.11.8/lib/jline-2.12.1.jar:/Users/jason/scala/2.11.8/lib/scala-actors-2.11.0.jar:/Users/jason/scala/2.11.8/lib/scala-actors-migration_2.11-1.1.0.jar:/Users/jason/scala/2.11.8/lib/scala-compiler.jar:/Users/jason/scala/2.11.8/lib/scala-continuations-library_2.11-1.0.2.jar:/Users/jason/scala/2.11.8/lib/scala-continuations-plugin_2.11.8-1.0.2.jar:/Users/jason/scala/2.11.8/lib/scala-library.jar:/Users/jason/scala/2.11.8/lib/scala-parser-combinators_2.11-1.0.4.jar:/Users/jason/scala/2.11.8/lib/scala-reflect.jar:/Users/jason/scala/2.11.8/lib/scala-swing_2.11-1.0.2.jar:/Users/jason/scala/2.11.8/lib/scala-xml_2.11-1.0.4.jar:/Users/jason/scala/2.11.8/lib/scalap-2.11.8.jar
-classpath
""
-Dscala.home=/Users/jason/scala/2.11.8
-Dscala.usejavacp=true
-Denv.emacs=
scala.tools.nsc.Main
-version

Scala compiler version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

real    0m0.219s
user    0m0.213s
sys 0m0.047s

% time qscalac -debug -version

/Library/Java/JavaVirtualMachines/jbsdk8u112b403_osx_x64/Contents/Home/bin/java
-Xmx256M
-Xms32M
-Xbootclasspath/a:/Users/jason/code/scala/build/quick/classes/compiler:/Users/jason/code/scala/build/quick/classes/library:/Users/jason/code/scala/build/quick/classes/reflect:/Users/jason/.ivy2/cache/org.apache.ant/ant/jars/ant-1.9.4.jar:/Users/jason/.ivy2/cache/org.apache.ant/ant-launcher/jars/ant-launcher-1.9.4.jar:/Users/jason/.ivy2/cache/org.scala-lang.modules/scala-asm/bundles/scala-asm-5.1.0-scala-1.jar:/Users/jason/.ivy2/cache/org.scala-lang.modules/scala-xml_2.12.0-RC1/bundles/scala-xml_2.12.0-RC1-1.0.5.jar:/Users/jason/.ivy2/cache/jline/jline/jars/jline-2.14.1.jar
-classpath
""
-Dscala.home=/Users/jason/code/scala/build/quick
-Dscala.usejavacp=true
-Denv.emacs=
scala.tools.nsc.Main
-version

Scala compiler version 2.12.0-20161011-181858-1e81a09 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.

real    0m0.897s
user    0m0.923s
sys 0m0.088s

Profiling this suggests that the dominant factor is resolution of default methods, which HotSpot JVM performs eagerly during classloading. I believe that collections present the biggest challenge, which suggests that this regression will be felt by all Scala programs, not just the compiler.

This might be inherent complexity required to implement method resolution, in particular:

https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-5.html#jvms-5.4.3.3

A maximally-specific superinterface method of a class or interface C for a particular method name and descriptor is any method for which all of the following are true:

The method is declared in a superinterface (direct or indirect) of C.

The method is declared with the specified name and descriptor.

The method has neither its ACC_PRIVATE flag nor its ACC_STATIC flag set.

Where the method is declared in interface I, there exists no other maximally-specific superinterface method of C with the specified name and descriptor that is declared in a subinterface of I.

For completeness, here's the corresponding spec in the Java language: https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.12.2.5

The implementation in OpenJDK is in defaultmethods.cpp.

I have ported this to Scala to try to understand the algorithm and potential optimizations (either in our code generation or in OpenJDK itself).

I have also synthesized a pure Java test case that demonstrates exponential performance, both in javac and java.

retronym commented 7 years ago

/cc @DarkDimius @smarter

retronym commented 7 years ago

A workaround is to add scala-library.jar to the Class Data Sharing dump.

This is pretty laborious.

There are JVM command line options that could be used to tweak the process:

adriaanm commented 7 years ago

I have ported this to Scala to try to understand the algorithm and potential optimizations (either in our code generation or in OpenJDK itself).

❤️

lrytz commented 7 years ago

I reproduced this with your jmh compiler benchmark (with source=@/.../version containing -version). The mixin forwarders (added in https://github.com/scala/scala/pull/5429) make a difference:

I also spent some time yesterday reading through the JVM's classfile parser, but i'm not through yet. So the following might be incorrect.

IUC, a "miranda method" is a method that a class inherits from an interface without having an override in the class or any of its superclasses. This includes non-overridden default methods. Miranda methods get a slot in the class's vtable. generate_default_methods will add a method stub to every miranda slot, the stub is called "overpass method". The overpass method

IUC, according to your observations, the performance bottleneck is finding the maximally specific method. If Scalac emits all mixin forwarders this does not need to be done: there are no miranda methods added for default methods, because there's always a concrete overriding method in the class (the forwarder).

Let me know what I got wrong :)

retronym commented 7 years ago

Interesting. I didn't see a difference with/without forwarders, but maybe I was measuring the wrong revisions. I was only using time ... from the command line to measure, which wasn't as rigorous as using JMH.

As I see it, the presence of our mixin forwarders (or generally, a matching concrete method in the superclass chain) won't prevent the traversal of all paths through the supertype lattice, but it will prevent a later iteration through all the candidate methods.

I can't see any reason not to short circuit the hierarchy walk when a matching class method. I'm trying this out in OpenJDK.

In summary, I think our decision to use forwarders will not only have a benefit for classfie parsing with current JDKs, but might get closer to zero-cost if we can patch the JDK.

retronym commented 7 years ago

UPDATE: OpenJDK accepted this bug as JDK-8167995 and fixed it for JDK 9


I'm reporting another bug with the diagnostic code in OpenJDK. Noting it here for posterity:

Enabling diagnostic logging of default method resolution in the triggers a VM crash during a buffer resize. This resize is triggered by using a method with a large descriptor.

A code comment in the implementation warns:

  // This resource mark is the bound for all memory allocation that takes
  // place during default method processing.  After this goes out of scope,
  // all (Resource) objects' memory will be reclaimed.  Be careful if adding an
  // embedded resource mark under here as that memory can't be used outside
  // whatever scope it's in.
  ResourceMark rm(THREAD);

But down below, print_slot and print_method define a nested ResourceMark.

package test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test;

public class Test {
  interface I {
    default void f(Test a, Test b, Test c, Test e, Test f, Test g, Test h, Test i) {}
  }
  static class C implements I {}
  public static void main(String[] args) {
    new C();
  }
}
% export JAVA_HOME=/code/openjdk9dev/build/macosx-x86_64-normal-server-fastdebug/jdk
% $JAVA_HOME/bin/javac -d /tmp /tmp/Test.java
% $JAVA_HOME/bin/java -cp /tmp -Xlog:defaultmethods=debug test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.Test
...
(/Users/jason/code/openjdk9dev/hotspot/src/share/vm/utilities/ostream.cpp:340), pid=54634, tid=7427
#  assert(rm == __null || Thread::current()->current_resource_mark() == rm) failed: StringStream is re-allocated with a different ResourceMark. Current: 0x00007000053a76a0 original: 0x00007000053a78e8
diff -r fec31089c2ef src/share/vm/classfile/defaultMethods.cpp
--- a/src/share/vm/classfile/defaultMethods.cpp Thu Oct 06 18:05:53 2016 -0700
+++ b/src/share/vm/classfile/defaultMethods.cpp Fri Oct 14 10:09:56 2016 +1100
@@ -77,12 +77,10 @@
 };

 static void print_slot(outputStream* str, Symbol* name, Symbol* signature) {
-  ResourceMark rm;
   str->print("%s%s", name->as_C_string(), signature->as_C_string());
 }

 static void print_method(outputStream* str, Method* mo, bool with_class=true) {
-  ResourceMark rm;
   if (with_class) {
     str->print("%s.", mo->klass_name()->as_C_string());
   }
retronym commented 7 years ago

I've reproduced your results showing that performance for 2.11.8 > 2.12 with forwarders > 2.12 without forwarders. Previously, I'd failed to bootstrap to see the effect addition of forwarders.

Adding forwarders reduces the number of "Looking for default methods for" log entries from 27581 to 15687 running scalac -version -J-Xlog:defaultmethods=debug.

Here are the remaining entries: https://gist.github.com/retronym/d223044d719a13350da7715eb5b5a14b

Looking at these in context, I see that interfaces are also subject to default method processing. For instance, here's the log for GenIterableLike

retronym commented 7 years ago

Looking at XCode profiler again, this time using the proposed 2.12.0-RC2 ("with forwarders"), scalac -version spends 32% of the time in DefaultMethods::generate_default_methods and its callees.

I'm a newbie with the tool, and couldn't find a direct way to measure this, other than by "pruning" that method from the trace and seeing that the reported sample time went from 630ms down to 446ms.

I don't see any way forward to eat into that number (other than the extreme measure of Class Data Sharing outlined above). Scala collections are probably a worst case scenario (lots of methods and lots of hierarchy).

lrytz commented 7 years ago

Very interesting observation that defaults are also resolved for interfaces, and that it's still such a significant chunk of startup time.

retronym commented 7 years ago

There has been some progress on this in OpenJDK under https://bugs.openjdk.java.net/browse/JDK-8167431, although it appears that so far the only fix is to the performance of javac, not to default method resolution in the JVM itself.

smarter commented 7 years ago

it appears that so far the only fix is to the performance of javac, not to default method resolution in the JVM itself.

And the bug was closed, so the original issue you reported may have been forgotten.

retronym commented 7 years ago

I've emailed contact from Oracle to clarify.

SethTisue commented 3 years ago

fwiw, for time scalac -version I'm seeing 2.11 0.3s, 2.12 1.0s, 2.13 0.7s

SethTisue commented 3 years ago

and time dotc -version 1.1s