microsoft / openjdk

Microsoft Build of OpenJDK
https://www.microsoft.com/openjdk
MIT License
323 stars 20 forks source link

Unexpected jvm crashes on 17.0.8 #577

Closed Riddle4045 closed 1 year ago

Riddle4045 commented 1 year ago

Describe the bug

we are running into frequent jvm crashes while running msopenjdk 17.0.8, we don't observe these crashes when we downgrade to 17.0.7

Desktop (please complete the following information):

OS & version [e.g. Windows 10, 1822]:

NAME="Common Base Linux Mariner"
VERSION="2.0.20231004"
ID=mariner
VERSION_ID="2.0"
PRETTY_NAME="CBL-Mariner/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://aka.ms/cbl-mariner"
BUG_REPORT_URL="https://aka.ms/cbl-mariner"
SUPPORT_URL="https://aka.ms/cbl-mariner"

MSFT Build of OpenJDK Version:

openjdk 17.0.8.1 2023-08-24 LTS
OpenJDK Runtime Environment Microsoft-8297089 (build 17.0.8.1+1-LTS)
OpenJDK 64-Bit Server VM Microsoft-8297089 (build 17.0.8.1+1-LTS, mixed mode)

To Reproduce: we are running mariner on Kubernetes pods, when calling into our service running on these pods we encounter these issues.

Screenshots, Logs etc hotspot error logs beginning pasted below, I can share the entire log file if needed. k8s restarts the failing container core-dumps are not written by the time container restarts, will be some work to get the core dumps, but it's passible if required.

no [ /etc/trino/data ]$ cat hs_err_pid9.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007562c10eb5bf, pid=9, tid=160
#
# JRE version: OpenJDK Runtime Environment Microsoft-8297089 (17.0.8.1+1) (build 17.0.8.1+1-LTS)
# Java VM: OpenJDK 64-Bit Server VM Microsoft-8297089 (17.0.8.1+1-LTS, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# J 22841 c2 com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject(Lcom/microsoft/sqlserver/jdbc/JDBCType;Lcom/microsoft/sqlserver/jdbc/SSType;IJI)Ljava/lang/Object; (678 bytes) @ 0x00007562c10eb5bf [0x00007562c10eb1a0+0x000000000000041f]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /etc/trino/data/core.9)
#
# If you would like to submit a bug report, please visit:
#   https://github.com/microsoft/openjdk/issues
#

---------------  S U M M A R Y ------------

Command Line: -agentpath:/usr/lib/trino/bin/libjvmkill.so -XX:G1HeapRegionSize=32M -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -XX:ReservedCodeCacheSize=256M -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000 -Djdk.attach.allowAttachSelf=true -Djdk.nio.maxCachedBufferSize=2000000 -XX:+UnlockDiagnosticVMOptions -XX:+UseAESCTRIntrinsics -XX:-G1UsePreventiveGC --add-opens=java.management/sun.management=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED -javaagent:/usr/lib/trino/lib/jmx_prometheus_javaagent-0.16.1.jar=9999:/etc/trino/jmx.config.yaml -Xmx19758m -Xms12349m -Dnode.id=trino-worker-7c7cc99788-fb7gg -Dlog.path=var/log/server.log -Dnode.data-dir=/etc/trino/data/ -Dnode.environment=cicd -Dplugin.dir=/usr/lib/trino/plugin -Dlog.levels-file=/etc/trino/..2023_10_25_21_22_26.3776852031/log.properties -Dconfig=/etc/trino/..2023_10_25_21_22_26.3776852031/config.properties io.trino.server.TrinoServer

Host: AMD EPYC 7763 64-Core Processor, 8 cores, 24G, CBL-Mariner 2.0.20231004
Time: Wed Oct 25 21:26:31 2023 UTC elapsed time: 240.069807 seconds (0d 0h 4m 0s)

---------------  T H R E A D  ---------------

Current thread (0x00007562c9b1b920):  JavaThread "20231025_212628_00020_2sxtd.11.0.0-0-59" [_thread_in_Java, id=160, stack(0x00007562544ee000,0x00007562545ee000)]

Stack: [0x00007562544ee000,0x00007562545ee000],  sp=0x00007562545eb300,  free space=1012k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 22841 c2 com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject(Lcom/microsoft/sqlserver/jdbc/JDBCType;Lcom/microsoft/sqlserver/jdbc/SSType;IJI)Ljava/lang/Object; (678 bytes) @ 0x00007562c10eb5bf [0x00007562c10eb1a0+0x000000000000041f]
J 22840 c2 com.microsoft.sqlserver.jdbc.DDC.convertTemporalToObject(Lcom/microsoft/sqlserver/jdbc/JDBCType;Lcom/microsoft/sqlserver/jdbc/SSType;Ljava/util/Calendar;IJI)Ljava/lang/Object; (1216 bytes) @ 0x00007562c10e7dc0 [0x00007562c10e7da0+0x0000000000000020]
J 20159 c2 com.microsoft.sqlserver.jdbc.ServerDTVImpl.getValue(Lcom/microsoft/sqlserver/jdbc/DTV;Lcom/microsoft/sqlserver/jdbc/JDBCType;ILcom/microsoft/sqlserver/jdbc/InputStreamGetterArgs;Ljava/util/Calendar;Lcom/microsoft/sqlserver/jdbc/TypeInfo;Lcom/microsoft/sqlserver/jdbc/CryptoMetadata;Lcom/microsoft/sqlserver/jdbc/TDSReader;Lcom/microsoft/sqlserver/jdbc/SQLServerStatement;)Ljava/lang/Object; (1190 bytes) @ 0x00007562c0e24d30 [0x00007562c0e24360+0x00000000000009d0]
J 19662 c2 com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(ILcom/microsoft/sqlserver/jdbc/JDBCType;)Ljava/lang/Object; (9 bytes) @ 0x00007562c0db59c4 [0x00007562c0db5540+0x0000000000000484]
J 22863 c2 io.trino.plugin.jdbc.ReadFunction.isNull(Ljava/sql/ResultSet;I)Z (15 bytes) @ 0x00007562c10cbec4 [0x00007562c10cbc80+0x0000000000000244]
J 19675 c2 io.trino.plugin.jdbc.JdbcRecordCursor.isNull(I)Z (75 bytes) @ 0x00007562c0db9dd4 [0x00007562c0db9d40+0x0000000000000094]
J 22857 c1 io.trino.$gen.CursorProcessor_20231025_212631_205.filter(Lio/trino/spi/connector/ConnectorSession;Lio/trino/spi/connector/RecordCursor;)Z (79 bytes) @ 0x00007562ba468134 [0x00007562ba468020+0x0000000000000114]
j  io.trino.$gen.CursorProcessor_20231025_212631_205.process(Lio/trino/spi/connector/ConnectorSession;Lio/trino/operator/DriverYieldSignal;Lio/trino/spi/connector/RecordCursor;Lio/trino/spi/PageBuilder;)Lio/trino/operator/project/CursorProcessorOutput;+57
j  io.trino.operator.ScanFilterAndProjectOperator$RecordCursorToPages.process()Lio/trino/operator/WorkProcessor$ProcessState;+30
J 22501 c2 io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process()Z (76 bytes) @ 0x00007562c106943c [0x00007562c10693c0+0x000000000000007c]
J 19113 c1 io.trino.operator.WorkProcessorUtils.getNextState(Lio/trino/operator/WorkProcessor;)Lio/trino/operator/WorkProcessor$ProcessState; (55 bytes) @ 0x00007562b9e57dbc [0x00007562b9e57d40+0x000000000000007c]
j  io.trino.operator.WorkProcessorUtils$YieldingProcess.process()Lio/trino/operator/WorkProcessor$ProcessState;+37
J 22501 c2 io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process()Z (76 bytes) @ 0x00007562c106943c [0x00007562c10693c0+0x000000000000007c]
J 19113 c1 io.trino.operator.WorkProcessorUtils.getNextState(Lio/trino/operator/WorkProcessor;)Lio/trino/operator/WorkProcessor$ProcessState; (55 bytes) @ 0x00007562b9e57dbc [0x00007562b9e57d40+0x000000000000007c]
J 20051 c1 io.trino.operator.WorkProcessorUtils$BlockingProcess.process()Lio/trino/operator/WorkProcessor$ProcessState; (75 bytes) @ 0x00007562ba03c9dc [0x00007562ba03c920+0x00000000000000bc]
J 22501 c2 io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process()Z (76 bytes) @ 0x00007562c106943c [0x00007562c10693c0+0x000000000000007c]
J 19111 c1 io.trino.operator.WorkProcessorUtils.lambda$flatten$6(Lio/trino/operator/WorkProcessor;)Lio/trino/operator/WorkProcessor$TransformationState; (64 bytes) @ 0x00007562b9e56944 [0x00007562b9e568c0+0x0000000000000084]
J 19110 c1 io.trino.operator.WorkProcessorUtils$$Lambda$2990+0x00000008010dbdc0.process(Ljava/lang/Object;)Lio/trino/operator/WorkProcessor$TransformationState; (8 bytes) @ 0x00007562b9e52b64 [0x00007562b9e52ac0+0x00000000000000a4]
J 22159 c2 io.trino.operator.WorkProcessorUtils$3.process()Lio/trino/operator/WorkProcessor$ProcessState; (226 bytes) @ 0x00007562c0ff0554 [0x00007562c0ff0360+0x00000000000001f4]
J 22159 c2 io.trino.operator.WorkProcessorUtils$3.process()Lio/trino/operator/WorkProcessor$ProcessState; (226 bytes) @ 0x00007562c0ff042c [0x00007562c0ff0360+0x00000000000000cc]
J 22501 c2 io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process()Z (76 bytes) @ 0x00007562c106943c [0x00007562c10693c0+0x000000000000007c]
J 19113 c1 io.trino.operator.WorkProcessorUtils.getNextState(Lio/trino/operator/WorkProcessor;)Lio/trino/operator/WorkProcessor$ProcessState; (55 bytes) @ 0x00007562b9e57dbc [0x00007562b9e57d40+0x000000000000007c]
j  io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(Lio/trino/operator/WorkProcessor;Ljava/util/function/Consumer;)Lio/trino/operator/WorkProcessor$ProcessState;+1
j  io.trino.operator.WorkProcessorUtils$$Lambda$3586+0x00000008011f8000.process()Lio/trino/operator/WorkProcessor$ProcessState;+8
J 22501 c2 io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process()Z (76 bytes) @ 0x00007562c106943c [0x00007562c10693c0+0x000000000000007c]
J 19113 c1 io.trino.operator.WorkProcessorUtils.getNextState(Lio/trino/operator/WorkProcessor;)Lio/trino/operator/WorkProcessor$ProcessState; (55 bytes) @ 0x00007562b9e57dbc [0x00007562b9e57d40+0x000000000000007c]
j  io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(Ljava/util/function/BooleanSupplier;Lio/trino/operator/WorkProcessor;)Lio/trino/operator/WorkProcessor$ProcessState;+14
j  io.trino.operator.WorkProcessorUtils$$Lambda$3588+0x00000008011f8450.process()Lio/trino/operator/WorkProcessor$ProcessState;+8
J 22501 c2 io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process()Z (76 bytes) @ 0x00007562c106943c [0x00007562c10693c0+0x000000000000007c]
j  io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput()Lio/trino/spi/Page;+4
karianna commented 1 year ago

@Riddle4045 We've just released 17.0.9 - are you able to give that a go and report back? Also which version of the SQL driver are you using?

brunoborges commented 1 year ago

Can you share a reproducible test?

Riddle4045 commented 1 year ago

@Riddle4045 We've just released 17.0.9 - are you able to give that a go and report back? Also which version of the SQL driver are you using?

@karianna Sorry for the delay in response, I was travelling - I will try 17.0.9 today and get back to you. I was on the latest version 12.4, Also tried 11.2. This was not the only call flow that crashed the k8s containers, we also have a few customers reporting the crashes in a separate call flow not involving SQLserver driver.

Riddle4045 commented 1 year ago

@brunoborges we deploy a query engine (Trino) on K8s and query data from SQL server & ADLS gen2 via the query engine, unfortunately the deployment is a sophisticated and I will not be able to share simple repro here.

Riddle4045 commented 1 year ago

@Riddle4045 We've just released 17.0.9 - are you able to give that a go and report back? Also which version of the SQL driver are you using?

@karianna Sorry for the delay in response, I was travelling - I will try 17.0.9 today and get back to you. I was on the latest version 12.4, Also tried 11.2. This was not the only call flow that crashed the k8s containers, we also have a few customers reporting the crashes in a separate call flow not involving SQLserver driver.

@karianna I was not able to repro the crashed on 17.0.9, I have a consistent repro only on 17.0.8

karianna commented 1 year ago

Thanks for checking! We do only officially support the very latest point release, so I'll close this for now but if it re-appears in 17.0.9 let us know!