grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.43k stars 3.85k forks source link

JVM crashes with grpc-netty-shaded 1.67.1 and alpine docker image #11660

Open jehervy opened 1 day ago

jehervy commented 1 day ago

An update of com.google.cloud dependency (which depends on grpc-netty-shaded:1.67.1) with a image built from eclipse-temurin:21-jre-alpine leaded to a segmentation fault:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000000000204b6, pid=1, tid=17
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.5+11 (21.0.5+11) (build 21.0.5+11-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (21.0.5+11-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libio_grpc_netty_shaded_netty_tcnative_linux_x86_6417406470977119301460.so+0x2a154]  netty_internal_tcnative_SSLContext_JNI_OnLoad+0x9c4
#
# Core dump will be written. Default location: /core.%e.1.%t
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log

As a workaround we had to downgrade to grpc-netty-shaded:1.65.1

Full error log: hs_err_pid1.log

reinaldomjr commented 10 hours ago

Ths is happening on eclipse-temurin:17 also:

#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00000000000204b6, pid=1, tid=7
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.13+11 (17.0.13+11) (build 17.0.13+11)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.13+11 (17.0.13+11, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64)
# Problematic frame:
# C [libio_grpc_netty_shaded_netty_tcnative_linux_x86_643039151283623279840.so+0x2a154] netty_internal_tcnative_SSLContext_JNI_OnLoad+0x9c4
#
# Core dump will be written. Default location: /core.%e.1.%t
#
# An error report file with more information is saved as:
# /app/hs_err_pid1.log
ejona86 commented 6 hours ago

eclipse-temurin:17?? That isn't Alpine. Alpine issues are common, but Ubuntu Temurin is very normal. I do see the address of the crash is the same, so likely related.

We either need to reproduce and try with various versions of netty-tcnative to try to find the change. Or we pull out objdump.


Investigation assuming the two errors are unrelated... (Spoiler: doesn't seem Alpine-specific.) Following the typical Alpine flow, it doesn't indeed seem related to glibc compatibility.

$ podman run --rm -it docker.io/eclipse-temurin:21-jre-alpine /bin/sh
/ # wget https://search.maven.org/remotecontent?filepath=io/netty/netty-tcnative-boringssl-static/2.0.65.Final/netty-tcnative-boringssl-static-2.0.65.Final-linux-
x86_64.jar
Connecting to search.maven.org (34.196.217.203:443)
Connecting to repo1.maven.org (199.232.192.209:443)
saving to 'netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar'
netty-tcnative-borin 100% |******************************************************************************************************************| 1187k  0:00:00 ETA
'netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar' saved
/ # 
/ # unzip netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar 
Archive:  netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar
   creating: META-INF/
  inflating: META-INF/MANIFEST.MF
   creating: META-INF/license/
   creating: META-INF/maven/
   creating: META-INF/maven/io.netty/
   creating: META-INF/maven/io.netty/netty-tcnative-boringssl-static/
   creating: META-INF/native/
  inflating: META-INF/LICENSE.txt
  inflating: META-INF/NOTICE.txt
  inflating: META-INF/license/LICENSE.aix-netbsd.txt
  inflating: META-INF/license/LICENSE.boringssl.txt
  inflating: META-INF/license/LICENSE.mvn-wrapper.txt
  inflating: META-INF/license/LICENSE.tomcat-native.txt
  inflating: META-INF/maven/io.netty/netty-tcnative-boringssl-static/pom.properties
  inflating: META-INF/maven/io.netty/netty-tcnative-boringssl-static/pom.xml
  inflating: META-INF/native/libnetty_tcnative_linux_x86_64.so
  inflating: META-INF/INDEX.LIST
/ # apk add gcompat
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz
(1/3) Installing musl-obstack (1.2.3-r2)
(2/3) Installing libucontext (1.2-r3)
(3/3) Installing gcompat (1.1.0-r4)
OK: 42 MiB in 73 packages
/ # LD_PRELOAD=/lib/libgcompat.so.0 ldd META-INF/native/libnetty_tcnative_linux_x86_64.so 
    /lib/ld-musl-x86_64.so.1 (0x7fb971f23000)
    /lib/libgcompat.so.0 => /lib/libgcompat.so.0 (0x7fb971f0b000)
    librt.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fb971f23000)
    libpthread.so.0 => /lib/ld-musl-x86_64.so.1 (0x7fb971f23000)
    libdl.so.2 => /lib/ld-musl-x86_64.so.1 (0x7fb971f23000)
    libc.so.6 => /lib/ld-musl-x86_64.so.1 (0x7fb971f23000)
    libucontext.so.1 => /lib/libucontext.so.1 (0x7fb971f06000)
    libobstack.so.1 => /usr/lib/libobstack.so.1 (0x7fb971f01000)

Looking to see if the glibc symbols changed:

/ # ldd META-INF/native/libnetty_tcnative_linux_x86_64.so 
    /lib/ld-musl-x86_64.so.1 (0x7f50ccdf0000)
    librt.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f50ccdf0000)
    libpthread.so.0 => /lib/ld-musl-x86_64.so.1 (0x7f50ccdf0000)
    libdl.so.2 => /lib/ld-musl-x86_64.so.1 (0x7f50ccdf0000)
    libc.so.6 => /lib/ld-musl-x86_64.so.1 (0x7f50ccdf0000)
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: __isnan: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: __isinf: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: __strdup: symbol not found

isnan and isinf are new. But no more need for libunwind:

/ # wget https://search.maven.org/remotecontent?filepath=io/netty/netty-tcnative-boringssl-static/2.0.61.Final/netty-tcnative-boringssl-static-2.0.61.Final-linux-
x86_64.jar
Connecting to search.maven.org (3.93.166.87:443)
Connecting to repo1.maven.org (199.232.196.209:443)
saving to 'netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar'
netty-tcnative-borin 100% |******************************************************************************************************************| 1218k  0:00:00 ETA
'netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar' saved
/ # unzip netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar 
Archive:  netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar
   creating: META-INF/
  inflating: META-INF/MANIFEST.MF
   creating: META-INF/license/
   creating: META-INF/maven/
   creating: META-INF/maven/io.netty/
   creating: META-INF/maven/io.netty/netty-tcnative-boringssl-static/
   creating: META-INF/native/
  inflating: META-INF/LICENSE.txt
  inflating: META-INF/NOTICE.txt
  inflating: META-INF/license/LICENSE.aix-netbsd.txt
  inflating: META-INF/license/LICENSE.boringssl.txt
  inflating: META-INF/license/LICENSE.mvn-wrapper.txt
  inflating: META-INF/license/LICENSE.tomcat-native.txt
  inflating: META-INF/maven/io.netty/netty-tcnative-boringssl-static/pom.properties
  inflating: META-INF/maven/io.netty/netty-tcnative-boringssl-static/pom.xml
  inflating: META-INF/native/libnetty_tcnative_linux_x86_64.so
  inflating: META-INF/INDEX.LIST
/ # ldd META-INF/native/libnetty_tcnative_linux_x86_64.so 
    /lib/ld-musl-x86_64.so.1 (0x7ff60d448000)
    libm.so.6 => /lib/ld-musl-x86_64.so.1 (0x7ff60d448000)
    libc.so.6 => /lib/ld-musl-x86_64.so.1 (0x7ff60d448000)
    ld-linux-x86-64.so.2 => /lib/ld-linux-x86-64.so.2 (0x7ff60d43b000)
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_GetRegionStart: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_RaiseException: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_SetIP: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_GetLanguageSpecificData: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_GetTextRelBase: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_Resume_or_Rethrow: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: __strdup: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_GetIPInfo: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_Resume: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_SetGR: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_DeleteException: symbol not found
Error relocating META-INF/native/libnetty_tcnative_linux_x86_64.so: _Unwind_GetDataRelBase: symbol not found