ibmruntimes / Semeru-Runtimes

Issue repo for all things IBM Semeru Runtimes
13 stars 3 forks source link

Remote debugging stops with AGENT_ERROR_INVALID_THREAD(203) #80

Open georgleber opened 4 weeks ago

georgleber commented 4 weeks ago

We are running Spring Boot in a Docker container. After upgrading from ibm-semeru-runtimes:open-17-jre to ibm-semeru-runtimes:open-21-jre we are having the problem that on "Step Over" in the remote debugger, the debugger disconnects from the remote process.

We are running the container with:

environment:
  - JAVA_TOOL_OPTIONS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:9986

The error in docker logs says:

2024-06-21T08:18:34.417Z INFO 1 --- [rechner-api-v2] [omcat-handler-0] o.a.c.c.C.[.[localhost].[/rechner-api] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2024-06-21T08:18:34.418398374Z 2024-06-21T08:18:34.418Z INFO 1 --- [rechner-api-v2] [omcat-handler-0] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2024-06-21T08:18:34.421827787Z 2024-06-21T08:18:34.421Z INFO 1 --- [rechner-api-v2] [omcat-handler-0] o.s.web.servlet.DispatcherServlet : Completed initialization in 3 ms
2024-06-21T08:18:36.057385578Z JDWP exit error AGENT_ERROR_INVALID_THREAD(203): missing entry in running thread table [src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c:1116]
2024-06-21T08:18:36.057546544Z
2024-06-21T08:18:36.057600559Z Fatal error: JDWP missing entry in running thread table, jvmtiError=AGENT_ERROR_INVALID_THREAD(203)
2024-06-21T08:18:36.769612373Z Listening for transport dt_socket at address: 9986

When switching to another OpenJDK (tested with eclipse-temurin-21-jre) debugging works as usual.

pshipton commented 3 weeks ago

@tajila fyi

pshipton commented 3 weeks ago

Opened https://github.com/eclipse-openj9/openj9/issues/19759 for tracking.

tajila commented 3 weeks ago

@babsingh Can you please take a look

babsingh commented 3 weeks ago

@georgleber Can you please provide the instructions to reproduce the failure?

georgleber commented 3 weeks ago

The problem might only affect IntelliJ, not sure about other IDEs.

I have created a simple project to demonstrate the problem: https://github.com/georgleber/jdk21-virtual-threads-semeru-bug

Start the application via docker compose, then attach a "Remote JVM Debug" (Port: 9986) to the container, set a breakpoint e.g on lines 24 and 30 and open the URL in browser: http://127.0.0.1:8181/async-test/5

The first run is fine. But if you reload and run "Step Over" in the debugger, the Docker container quits.

babsingh commented 3 weeks ago

Reproducing the failure

I was able to reproduce the failure using async-test, jdwp and jdb, without a container and IDE.

# Step 1: Terminal 1
$JAVA_HOME/bin/java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:9986 -Xint -jar ./build/libs/async-test.jar

# Step 2: Terminal 2
$JAVA_HOME/bin/jdb  -attach 9986

jdb> stop at de.aracom.AyncTestController:24
jdb> stop at de.aracom.AyncTestController:30

# Step 3: Open http://127.0.0.1:8080/async-test/5 in a browser or through links2 in another terminal
# http://127.0.0.1:8080/async-test/5 might need to be loaded twice.

Existing details on the bug

There is an OpenJDK bug for this failure: JDK-8305209.

The JDWP library was updated to address the above bug: Fix for JDK-8305209.

OpenJ9 still fails with the above fix. In libjdwp/threadControl.c, they maintain a thread list and it contains a virtual thread object which is terminated. Presence of a terminated thread in the thread list is causing the error.

JDWP code is quite significant. I briefly explored JDWP's native library, and it seems like they are using JVMTI VirtualThread[Start|End] events to add/remove threads from their list. In the extension repo, OpenJDK tests for these JVMTI events pass.

I will need to explore JDWP's native library in more detail to investigate further. Before I pursue this path, @tajila is there a team or member experienced in JDWP who can provide more insights on how to investigate this failure?

tajila commented 2 weeks ago

I will need to explore JDWP's native library in more detail to investigate further. Before I pursue this path, @tajila is there a team or member experienced in JDWP who can provide more insights on how to investigate this failure?

@JasonFengJ9 Has recently been doing some work here

thallium commented 1 week ago

@babsingh I'll investigate this issue.

babsingh commented 1 day ago

https://github.com/eclipse-openj9/openj9/pull/19855 has been merged to resolve this failure.

@georgleber Would you like a test-JDK with the fix for verification in your environment? If so, let us know which JDK version and platform (e.g. JDK21 Linux x86) is used for the Docker containers.

georgleber commented 20 hours ago

@babsingh Thank you and @thallium for fixing this bug. I would like to test it. We are using ibm-semeru-runtimes:open-21-jre as Docker base image on a Linux x86 system.

tajila commented 16 hours ago

@georgleber Unforntunately this change missed the deadline for our next (Q3) release (late July, early August) which is when we update our container images. The next release will be JDK23 in Septemper, or the Q4 release in October.

If you dont mind making some changes to the docker file, you could replace BINARY_URL= and the (ESUM) to one of our linux_x86-64 nightly builds, https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK21_x86-64_linux_Nightly/245/OpenJ9-JDK21-x86-64_linux-20240717-225817.tar.gz and test it that way.

babsingh commented 15 hours ago

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK21_x86-64_linux_Nightly/245/OpenJ9-JDK21-x86-64_linux-20240717-225817.tar.gz

The above nightly build is from 17 Jul 24. The fix was merged on 18 Jul 24. A nightly build after 19 Jul 24 should have the fix.