twitter / finagle

A fault tolerant, protocol-agnostic RPC system
https://twitter.github.io/finagle
Apache License 2.0
8.8k stars 1.46k forks source link

TLS performance degradation when using Java 11 #856

Open andrievsky opened 4 years ago

andrievsky commented 4 years ago

Significant TLS performance degradation after switching to AdoptOpenJDK 11 or above. Checked on Debian and Mac. Reproducible both locally and using k8s cluster with official AdoptOpenJDK images.

Platform: Debian, Mac

Sbt: 1.3.12

Scala: 2.13.2, 2.12.8, 2.11.12

Finagle: 20.5.0

Expected behavior

Similar CPU utilization in Java 8 and Java 11

Actual behavior

CPU utilization was about 3 times higher which also increase request processing time. A profiler shows the issue directly related to TLS handling.

Steps to reproduce the behavior

A service code processing TLS conection

import java.io.File

import com.twitter.finagle.ssl.KeyCredentials
import com.twitter.finagle.ssl.server.SslServerConfiguration
import com.twitter.finagle.{Http, Service, http}
import com.twitter.util.{Await, Future}

object Main extends App {
  val keyCredentials =
    KeyCredentials.CertKeyAndChain(
      new File("src/main/resources/localhost.crt"),
      new File("src/main/resources/localhost.key"),
      new File("src/main/resources/RootCA.pem"))

  val sslConfiguration = SslServerConfiguration(keyCredentials)

  val service = new Service[http.Request, http.Response] {
    def apply(req: http.Request): Future[http.Response] = {
      Future.value(http.Response(req.version, http.Status.Ok))
    }
  }

  val server = Http.server.withTransport.tls(sslConfiguration).serve(":8443", service)
  Await.ready(server)
}

Please checkout complete project from https://github.com/andrievsky/finagle-tls

Java 8

sdk use java 8.0.222.hs-adpt

sbt run

ab -n 10000 -c 10 https://localhost:8443/

Percentage of the requests served within a certain time (ms) 50% 43 66% 44 75% 45 80% 46 90% 49 95% 52 98% 56 99% 60 100% 81 (longest request)

Screenshot 2020-06-09 at 17 14 07

Java 11

sdk use java 11.0.7.hs-adpt

sbt run

ab -n 10000 -c 10 https://localhost:8443/

Percentage of the requests served within a certain time (ms) 50% 194 66% 200 75% 203 80% 205 90% 211 95% 216 98% 222 99% 227 100% 539 (longest request)

Screenshot 2020-06-09 at 17 19 30
hamdiallam commented 4 years ago

Can you paste outputs from something like async profiler? (https://github.com/jvm-profiling-tools/async-profiler) This seems to be more of a problem with AdoptOpenJDK perhaps? Have you tried to see if this regression is present using OpenJDK 11?

andrievsky commented 4 years ago

Yes, basically the results are the same. This regression is present in OpenJDK 11.

I tested both Oracle and Adopt images using Docker on Mac. I also got similar results on native Linux and Mac Java runtime earlier.

Native OpenSSL has been used in all cases (using static lib io.netty.tcnative.boringssl).

Async-profiler data collected from $ ./profiler.sh -d 30 {PID} under the same conditions ab -n 10000 -c 10 https://localhost:{PORT}/

image: amd64/openjdk:8-jre

java version: openjdk version "1.8.0_252" OpenJDK Runtime Environment (build 1.8.0_252-b09) OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       30   76  23.7     72     392
Processing:     1   11   9.7      9     257
Waiting:        1    9   7.5      7     257
Total:         34   87  24.0     81     547

Percentage of the requests served within a certain time (ms)
  50%     81
  66%     91
  75%    104
  80%    108
  90%    115
  95%    122
  98%    146
  99%    167
 100%    547 (longest request)

Profiler details

         ns  percent  samples  top
  ----------  -------  -------  ---
 29012228755   34.60%     2851  /tmp/libnetty_tcnative_linux_x86_641905628462921293948.so (deleted)
 28151494910   33.57%     2751  /lib/x86_64-linux-gnu/libc-2.28.so
  2420916027    2.89%      234  arch_local_irq_enable_[k]
  2154427974    2.57%      210  arch_local_irq_enable_[k]
  1793350048    2.14%      176  malloc
  1373280065    1.64%      136  PhaseIdealLoop::get_late_ctrl(Node*, Node*)
  1369294817    1.63%      133  arch_local_irq_restore_[k]
  1102223491    1.31%      107  io.netty.buffer.UnpooledUnsafeDirectByteBuf._setInt
   908324161    1.08%       88  io.netty.handler.codec.base64.Base64.encode3to4

image: amd64/openjdk:11-jre

java version: openjdk version "11.0.7" 2020-04-14 OpenJDK Runtime Environment 18.9 (build 11.0.7+10) OpenJDK 64-Bit Server VM 18.9 (build 11.0.7+10, mixed mode)

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      138  214  31.6    209     494
Processing:     1    3   2.8      2     121
Waiting:        1    2   2.6      2     116
Total:        140  217  31.9    212     591

Percentage of the requests served within a certain time (ms)
  50%    212
  66%    226
  75%    235
  80%    241
  90%    258
  95%    274
  98%    293
  99%    309
 100%    591 (longest request)

Profiler details

          ns  percent  samples  top
  ----------  -------  -------  ---
115232696742   53.33%    11375  sun.security.provider.SHA.implCompress0
 11677518573    5.40%     1154  /tmp/libnetty_tcnative_linux_x86_6417515162460062164916.so (deleted)
 10996312923    5.09%     1086  /lib/x86_64-linux-gnu/libc-2.28.so
  9326720586    4.32%      920  sun.security.provider.DigestBase.implCompressMultiBlockCheck
  6591359180    3.05%      645  arch_local_irq_enable_[k]
  6136777653    2.84%      606  sun.security.provider.ByteArrayAccess.b2iBig64
  5609239885    2.60%      544  arch_local_irq_enable_[k]
  4993558866    2.31%      493  java.util.Arrays.fill
  4820643720    2.23%      476  sun.security.provider.SHA.implCompressCheck
  4556372581    2.11%      450  java.util.Arrays.fill
  4552859539    2.11%      450  sun.security.provider.DigestBase.engineDigest
  4366307872    2.02%      431  jbyte_disjoint_arraycopy
  3323287166    1.54%      328  sun.security.provider.ByteArrayAccess.i2bBig
  2928047179    1.36%      289  sun.security.provider.DigestBase.engineReset
  2188849033    1.01%      216  sun.security.provider.DigestBase.engineUpdate

image: adoptopenjdk/openjdk8:debianslim-jre

java version: openjdk version "1.8.0_252" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.252-b09, mixed mode)

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       28   56  18.0     56     467
Processing:     1    9   8.8      6     419
Waiting:        1    8   7.6      5     419
Total:         31   66  17.7     62     510

Percentage of the requests served within a certain time (ms)
  50%     62
  66%     65
  75%     70
  80%     73
  90%     79
  95%     85
  98%     98
  99%    117
 100%    510 (longest request)

Profiler details

       ns  percent  samples  top
  ----------  -------  -------  ---
 24314627309   36.64%     2372  /tmp/libnetty_tcnative_linux_x86_648811857213780766878.so (deleted)
 16045331862   24.18%     1564  /lib/x86_64-linux-gnu/libc-2.28.so
  2009473716    3.03%      195  arch_local_irq_enable_[k]
  1631000669    2.46%      159  arch_local_irq_enable_[k]
  1518880708    2.29%      148  malloc
  1324645013    2.00%      132  PhaseIdealLoop::is_dominator(Node*, Node*)
  1197069288    1.80%      117  arch_local_irq_restore_[k]
   849604973    1.28%       81  arch_local_irq_restore_[k]
   799251756    1.20%       78  io.netty.buffer.UnpooledUnsafeDirectByteBuf._setInt

image: adoptopenjdk/openjdk11:debianslim-jre

java version: openjdk version "11.0.7" 2020-04-14 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10) OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.7+10, mixed mode)

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      127  202  30.1    198     527
Processing:     1    3   3.7      2     170
Waiting:        1    2   3.5      1     170
Total:        129  205  30.6    201     645

Percentage of the requests served within a certain time (ms)
  50%    201
  66%    213
  75%    223
  80%    228
  90%    245
  95%    258
  98%    277
  99%    293
 100%    645 (longest request)

Profiler details

         ns  percent  samples  top
  ----------  -------  -------  ---
119381838545   55.59%    11781  sun.security.provider.SHA.implCompress0
 10012942258    4.66%      990  /tmp/libnetty_tcnative_linux_x86_649309432395612600563.so (deleted)
  9644289556    4.49%      951  sun.security.provider.DigestBase.implCompressMultiBlockCheck
  7166098104    3.34%      701  arch_local_irq_enable_[k]
  6967147720    3.24%      687  sun.security.provider.ByteArrayAccess.b2iBig64
  5554444080    2.59%      548  sun.security.provider.SHA.implCompressCheck
  5502540386    2.56%      544  /lib/x86_64-linux-gnu/libc-2.28.so
  5185543938    2.41%      504  arch_local_irq_enable_[k]
  4983562442    2.32%      492  java.util.Arrays.fill
  4857273595    2.26%      479  java.util.Arrays.fill
  4649360905    2.17%      459  sun.security.provider.DigestBase.engineDigest
  4407231792    2.05%      435  jbyte_disjoint_arraycopy
  3434012474    1.60%      339  sun.security.provider.ByteArrayAccess.i2bBig
  3241524580    1.51%      320  sun.security.provider.DigestBase.engineReset
andrievsky commented 4 years ago

@hamdiallam Hello, would you like to be so kind to assist with the issue?

hamdiallam commented 4 years ago

Hey! Sorry for not getting around to this.

Just synced with some more folks internally. Our current builds and runtime environments are targeted towards JDK 8. JDK 11 is currently experimental and we are working towards moving towards JDK11.

I'll spend some time to see if I can reproduce a regression in our integration environment using a JDK 11 build. If so, this is something that will have to be addressed as we move over.

hamdiallam commented 4 years ago

Also thanks for the time it took to produce this data. It's super helpful

spockz commented 4 years ago

@andrievsky what happens if you use the SslContextServerEngineFactory instead? I noticed that for clients using the client variant improves the performance significantly, see #874.

rpless commented 3 years ago

Reading through #874, the suggestion seems to be that caching the SslContext might mitigate this issue. A similiar effect can be achieved with the ReloadingNetty4ServerEngineFactory which constructs the SslContext on an interval. Unlike the SslContextServerEngineFactory the Reloading engine factory can deal with most forms of KeyCredentials, so you don't have to go through the pain of manually creating an SslContext. If you expose the ReloadingNetty4ServerEngineFactory (note its private to finagle as far as I can tell), and use it in the benchmark project it acts as a workaround for the issue presented with the JdkServerEngineFactory.

When I run it locally I see the following results: https://gist.github.com/rpless/df3245f9b778335fc66ae2a5ea99c280. The ReloadingNetty4ServerEngineFactory on Java 11 yields similar results to the default JdkServerEngineFactory on Java 8. I think there is still an issue with the default SslServerEngineFactory on Java 11, however using SslContextServerEngineFactory or ReloadingNetty4ServerEngineFactory might be a good workaround.

Assuming your SslContext does not change throughout the lifecycle of the process, there are some tradeoffs. The SslContextServerEngineFactory is probably better performance-wise because you do not have to reconstruct the context on an interval but requires you to correctly construct an SslContext. The ReloadingNetty4ServerEngineFactory probably wastes some effort because it reconstructs it on an interval, but handles creation for all the version of KeyCredentials.