SymbolTable.isCompilerUniverse

mkeskells commented 6 years ago

This looks to be a hot but simple method

investigate if there is much improvement in CPU usage by making this

a static constant would see the effect although the conversion to work in all cases would be harder

investigate first, and if this is worthwhile then optimise

hrhino commented 6 years ago

JITWatch tells me that it's not inlined at the C1 level. Modifying as follows makes it inlined and appears to cause the JVM to assume that it's true in a normal compiler run:

diff --git a/src/reflect/scala/reflect/internal/SymbolTable.scala b/src/reflect/scala/reflect/internal/SymbolTable.scala
index 01df81a594..7dcba214a8 100644
--- a/src/reflect/scala/reflect/internal/SymbolTable.scala
+++ b/src/reflect/scala/reflect/internal/SymbolTable.scala
@@ -442,9 +442,12 @@ abstract class SymbolTable extends macros.Universe
   /** The phase which has given index as identifier. */
   val phaseWithId: Array[Phase]

+  protected[this] def isCompilerUniverse0: Boolean
+
   /** Is this symbol table a part of a compiler universe?
-   */
-  def isCompilerUniverse = false
+    */
+  // OPT avoid virtual call here
+  @inline final val isCompilerUniverse = isCompilerUniverse0

I doubt it can be made static since the same process can need both a reflective universe and a compiler global.

If it makes a very large difference we could have a static almost final value choosing between "current JVM only has JavaUniverse", "current JVM only has nsc.Global", and "current JVM has both, so make the call". I don't recall it being that hot though.

rorygraves commented 6 years ago

@hrhino Thanks thats really interesting - @mkeskells You mentioned something else as well about a JSR? I also can't remember when/where this was originally flagged.

mkeskells commented 6 years ago

I thought that we may be able to use the JSR 292 trick with SwitchTable - like the Statistics uses

I would suggest that we benchmark a compile with

final val isCompilerUniverse = true

and hacks around the other code to make it compile

to see how much time is possible to save, and that can judge what is a good or bad fix

hrhino commented 6 years ago

On Wed, Jan 24, 2018 at 4:34 PM, mkeskells notifications@github.com wrote:

I thought that we may be able to use the JSR 292 trick with SwitchTable - like the Statistics uses

I would suggest that we benchmark a compile with

final val isCompilerUniverse = true

to see how much time is possible to save, and that can judge what is a good or bad fix

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

That makes sense to me, I think. The JSR292 trick was what I referenced with "almost final value" above, btw.

--harrison

mkeskells commented 6 years ago

So I think we can do this

benchmark baseline
benchmark with final value hack

scope for improvement = 2-1

if the scope is big enough (3) how much gain do we get from the val based solution, if its not close to enough to (2) then try (4) almost final solution

that said I can't take this one now, so its a 'royal we'

@hrhino do you want to own and progress this issue (@rorygraves will have to add you)

@rorygraves or I can help you with a setup of the perf_tester if needed

I would expect it to be hot. every symbol flag or annotation access touches the method, but his item was from code inspection, not profiling

hrhino commented 6 years ago

On January 24, 2018 7:24:22 PM EST, mkeskells notifications@github.com wrote:

So I think we can do this

benchmark baseline

benchmark with final value hack

scope for improvement = 2-1

if the scope is big enough (3) how much gain do we get from the val based solution, if its not close to enough to (2) then try (4) almost final solution

that said I can't take this one now, so its a 'royal we'

@hrhino do you want to own and progress this issue (@rorygraves will have to add you)

@rorygraves or I can help you with a setup of the perf_tester if needed

I would expect it to be hot. every symbol flag or annotation access touches the method, but his item was from code inspection, not profiling

That makes sense. Profile driven optimization is the best optimization, obviously. I'll fiddle with benchmarks and post the results, and then we (royally) can decide if it's a big enough gain for the complexity that is JSR292.

@rorygraves you can assign this to me (please) or I'll just trust people to read the thread before picking it up.

Thanks, --harrison

hrhino commented 6 years ago

Hmm:

# with all isCompilerUniverse code removed 
[info] Benchmark                                   (corpusVersion)  (extraArgs)  (resident)  (source)    Mode  Cnt     Score    Error  Units
[info] HotScalacBenchmark.compile                           latest                    false    scalap  sample  151  2195.649 ± 66.365  ms/op
[info] HotScalacBenchmark.compile:compile·p0.00             latest                    false    scalap  sample       1793.065           ms/op
[info] HotScalacBenchmark.compile:compile·p0.50             latest                    false    scalap  sample       2151.678           ms/op
[info] HotScalacBenchmark.compile:compile·p0.90             latest                    false    scalap  sample       2561.881           ms/op
[info] HotScalacBenchmark.compile:compile·p0.95             latest                    false    scalap  sample       2641.573           ms/op
[info] HotScalacBenchmark.compile:compile·p0.99             latest                    false    scalap  sample       2882.158           ms/op
[info] HotScalacBenchmark.compile:compile·p0.999            latest                    false    scalap  sample       3028.287           ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999           latest                    false    scalap  sample       3028.287           ms/op
[info] HotScalacBenchmark.compile:compile·p1.00             latest                    false    scalap  sample       3028.287           ms/op

# with final val isCompilerUniverse 
[info] Benchmark                                   (corpusVersion)  (extraArgs)  (resident)  (source)    Mode  Cnt     Score    Error  Units
[info] HotScalacBenchmark.compile                           latest                    false    scalap  sample  143  2326.900 ± 76.704  ms/op
[info] HotScalacBenchmark.compile:compile·p0.00             latest                    false    scalap  sample       1885.340           ms/op
[info] HotScalacBenchmark.compile:compile·p0.50             latest                    false    scalap  sample       2264.924           ms/op
[info] HotScalacBenchmark.compile:compile·p0.90             latest                    false    scalap  sample       2683.516           ms/op
[info] HotScalacBenchmark.compile:compile·p0.95             latest                    false    scalap  sample       2886.520           ms/op
[info] HotScalacBenchmark.compile:compile·p0.99             latest                    false    scalap  sample       3390.172           ms/op
[info] HotScalacBenchmark.compile:compile·p0.999            latest                    false    scalap  sample       3414.163           ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999           latest                    false    scalap  sample       3414.163           ms/op
[info] HotScalacBenchmark.compile:compile·p1.00             latest                    false    scalap  sample       3414.163           ms/op

# Baseline: 9b89fe9
[info] Benchmark                                   (corpusVersion)  (extraArgs)  (resident)  (source)    Mode  Cnt     Score    Error  Units
[info] HotScalacBenchmark.compile                           latest                    false    scalap  sample  145  2295.904 ± 71.148  ms/op
[info] HotScalacBenchmark.compile:compile·p0.00             latest                    false    scalap  sample       1874.854           ms/op
[info] HotScalacBenchmark.compile:compile·p0.50             latest                    false    scalap  sample       2218.787           ms/op
[info] HotScalacBenchmark.compile:compile·p0.90             latest                    false    scalap  sample       2635.701           ms/op
[info] HotScalacBenchmark.compile:compile·p0.95             latest                    false    scalap  sample       2824.444           ms/op
[info] HotScalacBenchmark.compile:compile·p0.99             latest                    false    scalap  sample       3005.387           ms/op
[info] HotScalacBenchmark.compile:compile·p0.999            latest                    false    scalap  sample       3007.316           ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999           latest                    false    scalap  sample       3007.316           ms/op
[info] HotScalacBenchmark.compile:compile·p1.00             latest                    false    scalap  sample       3007.316           ms/op

So either I'm benchmarking poorly or it's not actually an optimization. I'll poke around a bit more with JITwatch but I don't understand why quite yet.

mkeskells commented 6 years ago

@hrhino did you get anywhere with this digging

hrhino commented 6 years ago

@mkeskells to an extent, which is that I think it should be faster but I need a shinier benchmark box. either way possibly less than I had hoped. the JIT doesn't seem to be super eager to use the final vality.

mkeskells commented 6 years ago

@hrhino can you share your branch or show a WIP PR

mkeskells commented 6 years ago

@hrhino can you repurpose a flag for that meaning

final def getFlag(mask: Long): Long = {
      var f = flags
      if ((f & IS_COMPILER_UNIVERSE) != 0) {
        if (!isThreadsafe(purpose = FlagOps(mask))) initialize
          f = flags &~IS_COMPILER_UNIVERSE
      }
      flags & mask
    }
    /** Does symbol have ANY flag in `mask` set? */
    final def hasFlag(mask: Long): Boolean = {
      getFlag(mask) != 0
    }
    def hasFlag(mask: Int): Boolean = hasFlag(mask.toLong)

    /** Does symbol have ALL the flags in `mask` set? */
    final def hasAllFlags(mask: Long): Boolean = {
      getFlag(mask)  == mask
    }
    resetFlags()
    def setFlag(mask: Long): this.type   = { _rawflags |= mask ; this }
    def resetFlag(mask: Long): this.type = { _rawflags &= ~mask ; this }
    def resetFlags() { rawflags = if (isCompilerUniverse) IS_COMPILER_UNIVERSE else 0 }

    def initFlags(mask: Long): this.type = {
      assert((rawflags & ~IS_COMPILER_UNIVERSE) == 0L, symbolCreationString)
      _rawflags = mask | ( if (isCompilerUniverse) IS_COMPILER_UNIVERSE else 0)
      this
    }

or some such

rorygraves / scalac_perf

SymbolTable.isCompilerUniverse #35