mdedetrich / linked-map

Provides an implementation for an immutable map that maintains key ordering
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

Investigate performance regression for Scala 2.12 #4

Closed mdedetrich closed 6 years ago

mdedetrich commented 6 years ago

For some reason, Scala 2.12.x has a regression for the unspecialized basic lookup key, results using benchmarkJVM/jmh:run -i 20 -wi 20 -f15 -t1

Scala 2.12.4

[info] FirstUnspecialized.basicConstructorLinkedMap  thrpt  300    1847741.548 ±   22738.195  ops/s
[info] FirstUnspecialized.basicConstructorMap        thrpt  300    2746783.070 ±   10222.044  ops/s
[info] FirstUnspecialized.basicLookupKeyLinkedMap    thrpt  300   72782143.922 ±  252646.801  ops/s
[info] FirstUnspecialized.basicLookupKeyMap          thrpt  300  170383766.817 ±  645683.093  ops/s
[info] Unary.basicConstructorLinkedMap               thrpt  300   16157079.296 ±   58145.640  ops/s
[info] Unary.basicConstructorMap                     thrpt  300   19333048.223 ±  102938.729  ops/s
[info] Unary.basicLookupKeyLinkedMap                 thrpt  300  291555363.306 ± 6239963.565  ops/s
[info] Unary.basicLookupKeyMap                       thrpt  300  300547516.075 ± 1251440.680  ops/s

Scala 2.11.12

[info] FirstUnspecialized.basicConstructorLinkedMap  thrpt  300    1497719.403 ±    5870.631  ops/s
[info] FirstUnspecialized.basicConstructorMap        thrpt  300    2106213.060 ±    9734.459  ops/s
[info] FirstUnspecialized.basicLookupKeyLinkedMap    thrpt  300  111312560.941 ±  409822.807  ops/s
[info] FirstUnspecialized.basicLookupKeyMap          thrpt  300  174248150.664 ± 1465742.214  ops/s
[info] Unary.basicConstructorLinkedMap               thrpt  300   19264503.071 ±   63510.488  ops/s
[info] Unary.basicConstructorMap                     thrpt  300   18437190.148 ±  114629.537  ops/s
[info] Unary.basicLookupKeyLinkedMap                 thrpt  300  282776216.380 ± 2080159.502  ops/s
[info] Unary.basicLookupKeyMap                       thrpt  300  283141122.744 ± 1567286.206  ops/s

As you an see with FirstUnspecialized.basicLookupKeyLinkedMap vs FirstUnspecialized.basicLookupKeyMap, on Scala 2.12.4 the performance penalty is ~58% where as for Scala 2.11.12 its ~37%.

Also of note is how the throughput numbers in general vary for Scala 2.12.4 for other cases (although the ratio of performance relative to Map stays the same).

@Ichoran do you have any ideas, is there something undocumented with Scala 2.12.x that I am missing? You can see the scalac compiler flags used for the build here https://github.com/mdedetrich/linked-map/blob/master/build.sbt#L20-L41

mdedetrich commented 6 years ago

So I tried adding the "-opt:l:method" flag, and now I get this, which is even more strange

[info] FirstUnspecialized.basicConstructorLinkedMap  thrpt  300    1767509.609 ±   40027.887  ops/s
[info] FirstUnspecialized.basicConstructorMap        thrpt  300    2394396.649 ±   36000.287  ops/s
[info] FirstUnspecialized.basicLookupKeyLinkedMap    thrpt  300   61018661.386 ± 1775024.316  ops/s
[info] FirstUnspecialized.basicLookupKeyMap          thrpt  300  161630053.575 ±  299878.430  ops/s
[info] Unary.basicConstructorLinkedMap               thrpt  300   14763856.411 ±  104662.254  ops/s
[info] Unary.basicConstructorMap                     thrpt  300   16488769.580 ±   58500.808  ops/s
[info] Unary.basicLookupKeyLinkedMap                 thrpt  300  260654161.170 ± 1331239.417  ops/s
[info] Unary.basicLookupKeyMap                       thrpt  300  260534008.732 ± 1048973.514  ops/s
Ichoran commented 6 years ago

it's probably the SAM encoding which generates deeper chains of forwarders that sometimes thwarts useful inlining. You can investigate e.g. the effect of -XX:MaxInlineLevel=15 (default is 9).

The new optimizer is...well...new, but it can do inlining which can relieve the pressure on the JIT to do it (among other things). It doesn't always work; we had to do some manual inlining to get several things up to speed again when 2.12 came out.

mdedetrich commented 6 years ago

@lrytz, @SethTisue said that you might be helpful here?

mdedetrich commented 6 years ago

Okay, so I made some jfr benchmarks for both Scala 2.12 and 2.11 (jfr files attached) and I noticed some interesting things, primarily that in the Scala 2.12 version, a huge amount of time is being spent in Scala.Option (in the 2.11 version, Scala.Option isnt even listed)

Here is a screenshot of Scala 2.12

screen shot 2017-11-16 at 12 13 19 am

And here is one of Scala 2.11

screen shot 2017-11-16 at 12 13 21 am

I think some weird inlining behavior is being exhibited?

Scala212.zip Scala211.zip

Ichoran commented 6 years ago

Possibly, but when profiling returns results like that all I feel comfortable concluding is that there is no actionable information.

mdedetrich commented 6 years ago

@Ichoran Do you have any recommendations as to how I should proceed from here, or is it too much effort for the gain?

Ichoran commented 6 years ago

Did you try benchmarks with -J-XX:MaxInlineDepth=15 or somesuch? If you don't do any better with that, I'd give up for now and just wait for the JVM to improve.

mdedetrich commented 6 years ago

@Ichoran Bingo, ran with benchmarkJVM/jmh:run --jvmArgs "-XX:MaxInlineLevel=15" -i 20 -wi 20 -f15 -t1

[info] FirstUnspecialized.basicConstructorLinkedMap  thrpt  300    2222651.735 ±   13268.196  ops/s
[info] FirstUnspecialized.basicConstructorMap        thrpt  300    4048622.933 ±   18736.669  ops/s
[info] FirstUnspecialized.basicLookupKeyLinkedMap    thrpt  300  125941975.374 ± 2134411.297  ops/s
[info] FirstUnspecialized.basicLookupKeyMap          thrpt  300  169414304.943 ±  747093.203  ops/s
[info] Unary.basicConstructorLinkedMap               thrpt  300   17999195.633 ±  109980.368  ops/s
[info] Unary.basicConstructorMap                     thrpt  300   18190127.391 ±   73954.009  ops/s
[info] Unary.basicLookupKeyLinkedMap                 thrpt  300  276513521.945 ± 1391917.392  ops/s
[info] Unary.basicLookupKeyMap                       thrpt  300  298514743.875 ± 1391750.947  ops/s

So it does seem the hotspot is missing stuff because the default inlining level is too low

How would you proceed from here?

Ichoran commented 6 years ago

Override the methods that should be but aren't inlining other methods. It can be hard to find the key spots, and it's a lot of code duplication, typically.

Ichoran commented 6 years ago

Also, the immutable collections don't give you much opportunity to override things, so you may have to copy their implementations.

mdedetrich commented 6 years ago

@Ichoran @smarter Thanks a lot for the help, ended up solving the issue. Now I am getting this on Scala 2.12.4

[info] FirstUnspecialized.basicConstructorLinkedMap  thrpt   25    2201490.907 ±   57276.433  ops/s
[info] FirstUnspecialized.basicConstructorMap        thrpt   25    2710996.320 ±   40121.945  ops/s
[info] FirstUnspecialized.basicLookupKeyLinkedMap    thrpt   25  120325439.312 ± 2314846.724  ops/s
[info] FirstUnspecialized.basicLookupKeyMap          thrpt   25  162325858.836 ± 2819704.896  ops/s
[info] Unary.basicConstructorLinkedMap               thrpt   25   18260845.680 ±  303096.664  ops/s
[info] Unary.basicConstructorMap                     thrpt   25   18463008.858 ±  377776.211  ops/s
[info] Unary.basicLookupKeyLinkedMap                 thrpt   25  259688108.885 ± 2289273.528  ops/s
[info] Unary.basicLookupKeyMap                       thrpt   25  285729912.762 ± 5781662.586  ops/s