Open mkeskells opened 6 years ago
JITWatch tells me that it's not inlined at the C1 level. Modifying as follows makes it inlined and appears to cause the JVM to assume that it's true
in a normal compiler run:
diff --git a/src/reflect/scala/reflect/internal/SymbolTable.scala b/src/reflect/scala/reflect/internal/SymbolTable.scala
index 01df81a594..7dcba214a8 100644
--- a/src/reflect/scala/reflect/internal/SymbolTable.scala
+++ b/src/reflect/scala/reflect/internal/SymbolTable.scala
@@ -442,9 +442,12 @@ abstract class SymbolTable extends macros.Universe
/** The phase which has given index as identifier. */
val phaseWithId: Array[Phase]
+ protected[this] def isCompilerUniverse0: Boolean
+
/** Is this symbol table a part of a compiler universe?
- */
- def isCompilerUniverse = false
+ */
+ // OPT avoid virtual call here
+ @inline final val isCompilerUniverse = isCompilerUniverse0
I doubt it can be made static since the same process can need both a reflective universe and a compiler global.
If it makes a very large difference we could have a static almost final value choosing between "current JVM only has JavaUniverse
", "current JVM only has nsc.Global
", and "current JVM has both, so make the call". I don't recall it being that hot though.
@hrhino Thanks thats really interesting - @mkeskells You mentioned something else as well about a JSR? I also can't remember when/where this was originally flagged.
I thought that we may be able to use the JSR 292 trick with SwitchTable - like the Statistics uses
I would suggest that we benchmark a compile with
final val isCompilerUniverse = true
and hacks around the other code to make it compile
to see how much time is possible to save, and that can judge what is a good or bad fix
On Wed, Jan 24, 2018 at 4:34 PM, mkeskells notifications@github.com wrote:
I thought that we may be able to use the JSR 292 trick with SwitchTable - like the Statistics uses
I would suggest that we benchmark a compile with
final val isCompilerUniverse = true
to see how much time is possible to save, and that can judge what is a good or bad fix
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
That makes sense to me, I think. The JSR292 trick was what I referenced with "almost final value" above, btw.
--harrison
So I think we can do this
scope for improvement = 2-1
if the scope is big enough (3) how much gain do we get from the val based solution, if its not close to enough to (2) then try (4) almost final solution
that said I can't take this one now, so its a 'royal we'
@hrhino do you want to own and progress this issue (@rorygraves will have to add you)
@rorygraves or I can help you with a setup of the perf_tester if needed
I would expect it to be hot. every symbol flag or annotation access touches the method, but his item was from code inspection, not profiling
On January 24, 2018 7:24:22 PM EST, mkeskells notifications@github.com wrote:
So I think we can do this
- benchmark baseline
- benchmark with final value hack
scope for improvement = 2-1
if the scope is big enough (3) how much gain do we get from the val based solution, if its not close to enough to (2) then try (4) almost final solution
that said I can't take this one now, so its a 'royal we'
@hrhino do you want to own and progress this issue (@rorygraves will have to add you)
@rorygraves or I can help you with a setup of the perf_tester if needed
I would expect it to be hot. every symbol flag or annotation access touches the method, but his item was from code inspection, not profiling
That makes sense. Profile driven optimization is the best optimization, obviously. I'll fiddle with benchmarks and post the results, and then we (royally) can decide if it's a big enough gain for the complexity that is JSR292.
@rorygraves you can assign this to me (please) or I'll just trust people to read the thread before picking it up.
Thanks, --harrison
Hmm:
# with all isCompilerUniverse code removed
[info] Benchmark (corpusVersion) (extraArgs) (resident) (source) Mode Cnt Score Error Units
[info] HotScalacBenchmark.compile latest false scalap sample 151 2195.649 ± 66.365 ms/op
[info] HotScalacBenchmark.compile:compile·p0.00 latest false scalap sample 1793.065 ms/op
[info] HotScalacBenchmark.compile:compile·p0.50 latest false scalap sample 2151.678 ms/op
[info] HotScalacBenchmark.compile:compile·p0.90 latest false scalap sample 2561.881 ms/op
[info] HotScalacBenchmark.compile:compile·p0.95 latest false scalap sample 2641.573 ms/op
[info] HotScalacBenchmark.compile:compile·p0.99 latest false scalap sample 2882.158 ms/op
[info] HotScalacBenchmark.compile:compile·p0.999 latest false scalap sample 3028.287 ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999 latest false scalap sample 3028.287 ms/op
[info] HotScalacBenchmark.compile:compile·p1.00 latest false scalap sample 3028.287 ms/op
# with final val isCompilerUniverse
[info] Benchmark (corpusVersion) (extraArgs) (resident) (source) Mode Cnt Score Error Units
[info] HotScalacBenchmark.compile latest false scalap sample 143 2326.900 ± 76.704 ms/op
[info] HotScalacBenchmark.compile:compile·p0.00 latest false scalap sample 1885.340 ms/op
[info] HotScalacBenchmark.compile:compile·p0.50 latest false scalap sample 2264.924 ms/op
[info] HotScalacBenchmark.compile:compile·p0.90 latest false scalap sample 2683.516 ms/op
[info] HotScalacBenchmark.compile:compile·p0.95 latest false scalap sample 2886.520 ms/op
[info] HotScalacBenchmark.compile:compile·p0.99 latest false scalap sample 3390.172 ms/op
[info] HotScalacBenchmark.compile:compile·p0.999 latest false scalap sample 3414.163 ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999 latest false scalap sample 3414.163 ms/op
[info] HotScalacBenchmark.compile:compile·p1.00 latest false scalap sample 3414.163 ms/op
# Baseline: 9b89fe9
[info] Benchmark (corpusVersion) (extraArgs) (resident) (source) Mode Cnt Score Error Units
[info] HotScalacBenchmark.compile latest false scalap sample 145 2295.904 ± 71.148 ms/op
[info] HotScalacBenchmark.compile:compile·p0.00 latest false scalap sample 1874.854 ms/op
[info] HotScalacBenchmark.compile:compile·p0.50 latest false scalap sample 2218.787 ms/op
[info] HotScalacBenchmark.compile:compile·p0.90 latest false scalap sample 2635.701 ms/op
[info] HotScalacBenchmark.compile:compile·p0.95 latest false scalap sample 2824.444 ms/op
[info] HotScalacBenchmark.compile:compile·p0.99 latest false scalap sample 3005.387 ms/op
[info] HotScalacBenchmark.compile:compile·p0.999 latest false scalap sample 3007.316 ms/op
[info] HotScalacBenchmark.compile:compile·p0.9999 latest false scalap sample 3007.316 ms/op
[info] HotScalacBenchmark.compile:compile·p1.00 latest false scalap sample 3007.316 ms/op
So either I'm benchmarking poorly or it's not actually an optimization. I'll poke around a bit more with JITwatch but I don't understand why quite yet.
@hrhino did you get anywhere with this digging
@mkeskells to an extent, which is that I think it should be faster but I need a shinier benchmark box. either way possibly less than I had hoped. the JIT doesn't seem to be super eager to use the final val
ity.
@hrhino can you share your branch or show a WIP PR
@hrhino can you repurpose a flag for that meaning
final def getFlag(mask: Long): Long = {
var f = flags
if ((f & IS_COMPILER_UNIVERSE) != 0) {
if (!isThreadsafe(purpose = FlagOps(mask))) initialize
f = flags &~IS_COMPILER_UNIVERSE
}
flags & mask
}
/** Does symbol have ANY flag in `mask` set? */
final def hasFlag(mask: Long): Boolean = {
getFlag(mask) != 0
}
def hasFlag(mask: Int): Boolean = hasFlag(mask.toLong)
/** Does symbol have ALL the flags in `mask` set? */
final def hasAllFlags(mask: Long): Boolean = {
getFlag(mask) == mask
}
resetFlags()
def setFlag(mask: Long): this.type = { _rawflags |= mask ; this }
def resetFlag(mask: Long): this.type = { _rawflags &= ~mask ; this }
def resetFlags() { rawflags = if (isCompilerUniverse) IS_COMPILER_UNIVERSE else 0 }
def initFlags(mask: Long): this.type = {
assert((rawflags & ~IS_COMPILER_UNIVERSE) == 0L, symbolCreationString)
_rawflags = mask | ( if (isCompilerUniverse) IS_COMPILER_UNIVERSE else 0)
this
}
or some such
This looks to be a hot but simple method
investigate if there is much improvement in CPU usage by making this
a static constant would see the effect although the conversion to work in all cases would be harder
investigate first, and if this is worthwhile then optimise