mmtk / mmtk-openjdk

OpenJDK binding for MMTk
Other
31 stars 33 forks source link

Deadlock in safepoint code #54

Open qinsoon opened 3 years ago

qinsoon commented 3 years ago

This was first observed in ef24f49dd3979c6914805f7061e21ae29f886c7e when running with MarkSweep along with the RustMMTkProbe:

MMTK_PLAN=MarkSweep build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -XX:+UseThirdPartyHeap -Xms128M -Xmx128M -Dprobes=RustMMTk -Djava.library.path=/home/yilin/Code/ci-perf-kit/probes/rust_mmtk -cp /home/yilin/Code/ci-perf-kit/probes/probes.jar:/home/yilin/Code/ci-perf-kit/probes/rust_mmtk/java:/usr/share/benchmarks/dacapo/dacapo-9.12-bach.jar Harness -n 2 -c probe.DacapoBachCallback fop

The bug is transient, it may take a few runs to reproduce it. This appears to be a failed assertion, thus only shows in debug builds. It seems unrelated with finalization (I tried to remove the finalization code, but the bug persisted). Wenyu also observed the bug when he worked on immix (along with RustMMTkProbe). I cannot reproduce the bug with SemiSpace.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/kunals/git/mmtk-openjdk/repos/openjdk/src/hotspot/share/runtime/safepoint.cpp:893), pid=10742, tid=10743
#  fatal error: Deadlock in safepoint code.  Should have called back to the VM before blocking.
#
# JRE version: OpenJDK Runtime Environment (11.0) (fastdebug build 11-internal+0-adhoc.kunals.openjdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 11-internal+0-adhoc.kunals.openjdk, mixed mode, tiered, third-party gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/kunals/git/probes/core.10742)
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -XX:+UseThirdPartyHeap -Xms128M -Xmx128M -Dprobes=RustMMTk -Djava.library.path=/home/kunals/git/probes Harness -n 2 -c probe.DacapoBachCallback fop

Host: rat, AMD Ryzen 9 3900X 12-Core Processor, 24 cores, 31G, Ubuntu 18.04.5 LTS
Time: Mon Mar 15 04:18:29 2021 AEDT elapsed time: 37 seconds (0d 0h 0m 37s)

---------------  T H R E A D  ---------------

Current thread (0x00007f515801a800):  JavaThread "main" [_thread_in_native_trans, id=10743, stack(0x00007f5163b1f000,0x00007f5163c20000)]

Stack: [0x00007f5163b1f000,0x00007f5163c20000],  sp=0x00007f5163c1cf60,  free space=1015k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1915a54]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x314
V  [libjvm.so+0x191693f]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x2f
V  [libjvm.so+0xb707ce]  report_fatal(char const*, int, char const*, ...)+0x12e
V  [libjvm.so+0x16d0657]  SafepointSynchronize::block(JavaThread*)+0x2d7
V  [libjvm.so+0x1849d3b]  JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread*)+0x41b
V  [libjvm.so+0x1854e01]  JavaThread::check_special_condition_for_native_trans(JavaThread*)+0x11
j  probe.RustMMTkProbe.begin_native(Ljava/lang/String;IZJ)V+0
j  probe.RustMMTkProbe.begin(Ljava/lang/String;IZ)V+15
j  probe.ProbeMux.begin(Ljava/lang/String;Z)V+34
j  probe.DacapoBachCallback.start(Ljava/lang/String;)V+5
j  org.dacapo.harness.Benchmark.run(Lorg/dacapo/harness/Callback;Ljava/lang/String;)Z+69
j  org.dacapo.harness.TestHarness.runBenchmark(Ljava/io/File;Ljava/lang/String;Lorg/dacapo/harness/TestHarness;)V+100
j  org.dacapo.harness.TestHarness.main([Ljava/lang/String;)V+665
j  Harness.main([Ljava/lang/String;)V+1
v  ~StubRoutines::call_stub
V  [libjvm.so+0xefd32a]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x88a
V  [libjvm.so+0xfe91a0]  jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) [clone .isra.94] [clone .constprop.225]+0x250
V  [libjvm.so+0x100e63d]  jni_CallStaticVoidMethod+0x20d
C  [libjli.so+0x4e98]  JavaMain+0xc08

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  probe.RustMMTkProbe.begin_native(Ljava/lang/String;IZJ)V+0
j  probe.RustMMTkProbe.begin(Ljava/lang/String;IZ)V+15
j  probe.ProbeMux.begin(Ljava/lang/String;Z)V+34
j  probe.DacapoBachCallback.start(Ljava/lang/String;)V+5
j  org.dacapo.harness.Benchmark.run(Lorg/dacapo/harness/Callback;Ljava/lang/String;)Z+69
j  org.dacapo.harness.TestHarness.runBenchmark(Ljava/io/File;Ljava/lang/String;Lorg/dacapo/harness/TestHarness;)V+100
j  org.dacapo.harness.TestHarness.main([Ljava/lang/String;)V+665
j  Harness.main([Ljava/lang/String;)V+1
v  ~StubRoutines::call_stub
qinsoon commented 3 years ago

The bug won't show if we do not use the RustMMTkProbe. Currently I am not able to reproduce this issue for other plans (SemiSpace or GenCopy). It only shows for MarkSweep.

qinsoon commented 2 years ago

According to the Zulip message here, this bug was also seen on GenCopy.

Hi, I got this error message in the log files when running the running-ng scripts with GenCopy and pmd. The plotty data seems fine. I'm not too sure what it was caused by, but the command line for it was: mkdir -p PLOTTY_WORKAROUND; timedrun; MMTK_PLAN=GenCopy MMTK_MAX_NURSERY=33554432 MMTK_MIN_NURSERY=33554432 /home/claireh/mmtk-openjdk/repos/openjdk/build_no_barrier/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Djava.library.path=/home/claireh/probes -Dprobes=RustMMTk -Xms12249M -Xmx12249M -cp /usr/share/benchmarks/dacapo/dacapo-evaluation-git-29a657f.jar:/home/claireh/probes:/home/claireh/probes/probes.jar Harness -c probe.DacapoChopinCallback -n 5 -s default pmd

qinsoon commented 2 years ago

This is possibly fixed by https://github.com/mmtk/mmtk-openjdk/pull/157.