twitter / finagle

A fault tolerant, protocol-agnostic RPC system
https://twitter.github.io/finagle
Apache License 2.0
8.79k stars 1.46k forks source link

JVM : SIGSEGV crash after upgrading from Finagle version to 20.4.0 from 19.9.0 #875

Closed ayushworks closed 3 years ago

ayushworks commented 4 years ago

After upgrading from finagle-core 19.9.0 to 20.4.0 we are seeing segmentation fault JVM crashes.

Expected behavior

Segmentation fault should not happen.

Actual behavior

Our application is quite I/O intensive. We receive a lot of http calls(20TPS) and we also make a lot of http calls (on a busy day 30TPS). The application is a Scala/Java jersey war deployed on tomcat server.

We are using finagle for making a lot of http calls. These calls are both secured and unsecured. (http and https). It was working fine earlier but since we moved to finagle-core 20.4.0 we are seeing JVM crashing without any logs and there is an hs_err_pid.log file getting generated.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb2f204b222, pid=5258, tid=0x00007fb156125700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_241-b26) (build 1.8.0_241-b26)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.241-b26 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x6d2222]  jni_CallIntMethod+0xf2
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

The stack trace points to finagle netty jar

Stack: [0x00007fb155f25000,0x00007fb156126000],  sp=0x00007fb156123980,  free space=2042k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x6d2222]  jni_CallIntMethod+0xf2
C  [libnetty_tcnative_linux_x86_642480814795102320417.so+0x24def]
C  [libnetty_tcnative_linux_x86_642480814795102320417.so+0x3d4fd]
C  [libnetty_tcnative_linux_x86_642480814795102320417.so+0x53cd9]
C  [libnetty_tcnative_linux_x86_642480814795102320417.so+0x55b18]
C  [libnetty_tcnative_linux_x86_642480814795102320417.so+0x54396]
C  [libnetty_tcnative_linux_x86_642480814795102320417.so+0x30839]
C  [libnetty_tcnative_linux_x86_642480814795102320417.so+0x3172c]
J 26700  io.netty.internal.tcnative.SSL.readFromSSL(JJI)I (0 bytes) @ 0x00007fb2df1c3405 [0x00007fb2df1c3340+0xc5]
J 36642 C2 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap([Ljava/nio/ByteBuffer;II[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult; (1344 bytes) @ 0x00007fb2e2af0a70 [0x00007fb2e2af0420+0x650]
J 35944 C2 io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(Lio/netty/handler/ssl/SslHandler;Lio/netty/buffer/ByteBuf;IILio/netty/buffer/ByteBuf;)Ljavax/net/ssl/SSLEngineResult; (134 bytes) @ 0x00007fb2e2917f68 [0x00007fb2e2917580+0x9e8]
J 36644 C2 io.netty.handler.ssl.SslHandler.unwrap(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;II)I (609 bytes) @ 0x00007fb2e2afddd0 [0x00007fb2e2afdbe0+0x1f0]
J 37067 C2 io.netty.handler.ssl.SslHandler.decode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (23 bytes) @ 0x00007fb2e16bdfdc [0x00007fb2e16bdda0+0x23c]
J 28401 C2 io.netty.handler.codec.ByteToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (317 bytes) @ 0x00007fb2e1872d14 [0x00007fb2e1872ae0+0x234]
J 28919 C2 io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (9 bytes) @ 0x00007fb2e19863e0 [0x00007fb2e1986280+0x160]
J 18275 C2 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Lio/netty/channel/AbstractChannelHandlerContext;Ljava/lang/Object;)V (53 bytes) @ 0x00007fb2df726ec0 [0x00007fb2df726dc0+0x100]
J 9784 C1 io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady()V (310 bytes) @ 0x00007fb2de973614 [0x00007fb2de971a80+0x1b94]
J 33265 C1 io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run()V (19 bytes) @ 0x00007fb2dfea898c [0x00007fb2dfea8880+0x10c]
J 32302 C2 io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z (96 bytes) @ 0x00007fb2e220a28c [0x00007fb2e220a160+0x12c]
J 33553% C2 io.netty.channel.epoll.EpollEventLoop.run()V (278 bytes) @ 0x00007fb2e24189a8 [0x00007fb2e24185c0+0x3e8]
j  io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V+44
J 12512 C1 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (225 bytes) @ 0x00007fb2dee8aa14 [0x00007fb2dee89a00+0x1014]
J 12511 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 0x00007fb2dee867c4 [0x00007fb2dee866c0+0x104]
j  com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run()V+10
j  io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4
yufangong commented 4 years ago

Hi @ayushworks, we had some JDK error in the 20.4.0 release and patched 20.4.1 https://finagle.github.io/blog/2020/04/26/release-notes/. Can you try to upgrade to 20.4.1 or the latest 20.10.0 to see if it can mitigate the issue? We will start to think about how to mark the problematic releases, thanks!

ayushworks commented 3 years ago

Hey @yufangong : is the release 20.4.0 marked as problematic yet?

yufangong commented 3 years ago

Hi @ayushworks, unfortunately, maven central published content is not modifiable, however, we have marked the problematic releases on our side in the CHANGELOG and release notes. Thank you!