Closed mdedetrich closed 6 years ago
So I tried adding the "-opt:l:method"
flag, and now I get this, which is even more strange
[info] FirstUnspecialized.basicConstructorLinkedMap thrpt 300 1767509.609 ± 40027.887 ops/s
[info] FirstUnspecialized.basicConstructorMap thrpt 300 2394396.649 ± 36000.287 ops/s
[info] FirstUnspecialized.basicLookupKeyLinkedMap thrpt 300 61018661.386 ± 1775024.316 ops/s
[info] FirstUnspecialized.basicLookupKeyMap thrpt 300 161630053.575 ± 299878.430 ops/s
[info] Unary.basicConstructorLinkedMap thrpt 300 14763856.411 ± 104662.254 ops/s
[info] Unary.basicConstructorMap thrpt 300 16488769.580 ± 58500.808 ops/s
[info] Unary.basicLookupKeyLinkedMap thrpt 300 260654161.170 ± 1331239.417 ops/s
[info] Unary.basicLookupKeyMap thrpt 300 260534008.732 ± 1048973.514 ops/s
it's probably the SAM encoding which generates deeper chains of forwarders that sometimes thwarts useful inlining. You can investigate e.g. the effect of -XX:MaxInlineLevel=15 (default is 9).
The new optimizer is...well...new, but it can do inlining which can relieve the pressure on the JIT to do it (among other things). It doesn't always work; we had to do some manual inlining to get several things up to speed again when 2.12 came out.
@lrytz, @SethTisue said that you might be helpful here?
Okay, so I made some jfr benchmarks for both Scala 2.12 and 2.11 (jfr files attached) and I noticed some interesting things, primarily that in the Scala 2.12 version, a huge amount of time is being spent in Scala.Option
(in the 2.11 version, Scala.Option
isnt even listed)
Here is a screenshot of Scala 2.12
And here is one of Scala 2.11
I think some weird inlining behavior is being exhibited?
Possibly, but when profiling returns results like that all I feel comfortable concluding is that there is no actionable information.
@Ichoran Do you have any recommendations as to how I should proceed from here, or is it too much effort for the gain?
Did you try benchmarks with -J-XX:MaxInlineDepth=15 or somesuch? If you don't do any better with that, I'd give up for now and just wait for the JVM to improve.
@Ichoran Bingo, ran with benchmarkJVM/jmh:run --jvmArgs "-XX:MaxInlineLevel=15" -i 20 -wi 20 -f15 -t1
[info] FirstUnspecialized.basicConstructorLinkedMap thrpt 300 2222651.735 ± 13268.196 ops/s
[info] FirstUnspecialized.basicConstructorMap thrpt 300 4048622.933 ± 18736.669 ops/s
[info] FirstUnspecialized.basicLookupKeyLinkedMap thrpt 300 125941975.374 ± 2134411.297 ops/s
[info] FirstUnspecialized.basicLookupKeyMap thrpt 300 169414304.943 ± 747093.203 ops/s
[info] Unary.basicConstructorLinkedMap thrpt 300 17999195.633 ± 109980.368 ops/s
[info] Unary.basicConstructorMap thrpt 300 18190127.391 ± 73954.009 ops/s
[info] Unary.basicLookupKeyLinkedMap thrpt 300 276513521.945 ± 1391917.392 ops/s
[info] Unary.basicLookupKeyMap thrpt 300 298514743.875 ± 1391750.947 ops/s
So it does seem the hotspot is missing stuff because the default inlining level is too low
How would you proceed from here?
Override the methods that should be but aren't inlining other methods. It can be hard to find the key spots, and it's a lot of code duplication, typically.
Also, the immutable collections don't give you much opportunity to override things, so you may have to copy their implementations.
@Ichoran @smarter Thanks a lot for the help, ended up solving the issue. Now I am getting this on Scala 2.12.4
[info] FirstUnspecialized.basicConstructorLinkedMap thrpt 25 2201490.907 ± 57276.433 ops/s
[info] FirstUnspecialized.basicConstructorMap thrpt 25 2710996.320 ± 40121.945 ops/s
[info] FirstUnspecialized.basicLookupKeyLinkedMap thrpt 25 120325439.312 ± 2314846.724 ops/s
[info] FirstUnspecialized.basicLookupKeyMap thrpt 25 162325858.836 ± 2819704.896 ops/s
[info] Unary.basicConstructorLinkedMap thrpt 25 18260845.680 ± 303096.664 ops/s
[info] Unary.basicConstructorMap thrpt 25 18463008.858 ± 377776.211 ops/s
[info] Unary.basicLookupKeyLinkedMap thrpt 25 259688108.885 ± 2289273.528 ops/s
[info] Unary.basicLookupKeyMap thrpt 25 285729912.762 ± 5781662.586 ops/s
For some reason, Scala 2.12.x has a regression for the unspecialized basic lookup key, results using
benchmarkJVM/jmh:run -i 20 -wi 20 -f15 -t1
Scala 2.12.4
Scala 2.11.12
As you an see with
FirstUnspecialized.basicLookupKeyLinkedMap
vsFirstUnspecialized.basicLookupKeyMap
, on Scala 2.12.4 the performance penalty is ~58% where as for Scala 2.11.12 its ~37%.Also of note is how the throughput numbers in general vary for Scala 2.12.4 for other cases (although the ratio of performance relative to
Map
stays the same).@Ichoran do you have any ideas, is there something undocumented with Scala 2.12.x that I am missing? You can see the scalac compiler flags used for the build here https://github.com/mdedetrich/linked-map/blob/master/build.sbt#L20-L41