helidon-io / helidon

Java libraries for writing microservices
https://helidon.io
Apache License 2.0
3.44k stars 562 forks source link

4.x: io.helidon.tests.functional.requestscope.TenantTest intermitentally hangs in validate workflow #8813

Closed barchetta closed 1 month ago

barchetta commented 1 month ago

Environment Details


Problem Description

Intermittently the build job in the validate workflow times out. I've seen this a couple times and each time the build appears to be stuck in:

19:45:50,884 [INFO] --< io.helidon.tests.functional:helidon-tests-functional-request-scope >--
19:45:50,884 [INFO] Building Helidon Tests Functional Request Scope and Fault Tolerance 4.0.0-SNAPSHOT [443/515]
19:45:50,884 [INFO]   from tests/functional/request-scope/pom.xml
19:45:50,884 [INFO] --------------------------------[ jar ]---------------------------------
19:45:50,890 [INFO] 
19:45:50,890 [INFO] --- maven-enforcer-plugin:3.3.0:enforce (enforce-dependencies) @ helidon-tests-functional-request-scope ---
19:45:50,894 [INFO] Rule 0: org.apache.maven.enforcer.rules.dependency.DependencyConvergence passed
19:45:50,894 [INFO] 
19:45:50,894 [INFO] --- directory-maven-plugin:1.0:highest-basedir (generate-top-parent-basedir) @ helidon-tests-functional-request-scope ---
19:45:50,907 [INFO] Highest basedir set to: /home/runner/work/helidon/helidon
19:45:50,907 [INFO] 
19:45:50,907 [INFO] --- buildnumber-maven-plugin:3.1.0:create (default) @ helidon-tests-functional-request-scope ---
19:45:50,907 [INFO] 
19:45:50,907 [INFO] --- maven-resources-plugin:3.3.1:copy-resources (copy-license) @ helidon-tests-functional-request-scope ---
19:45:50,908 [INFO] Copying 2 resources from ../../../licensing/templates to target/classes/META-INF
19:45:50,909 [INFO] 
19:45:50,909 [INFO] --- maven-resources-plugin:3.3.1:resources (default-resources) @ helidon-tests-functional-request-scope ---
19:45:50,910 [INFO] Copying 2 resources from src/main/resources to target/classes
19:45:50,910 [INFO] 
19:45:50,910 [INFO] --- maven-compiler-plugin:3.11.0:compile (default-compile) @ helidon-tests-functional-request-scope ---
19:45:50,911 [INFO] Changes detected - recompiling the module! :dependency
19:45:50,911 [INFO] Compiling 19 source files with javac [debug release 21] to target/classes
19:45:51,013 [INFO] 
19:45:51,014 [INFO] --- helidon-services-plugin:4.0.6:services (generate-meta-inf-services) @ helidon-tests-functional-request-scope ---
Skipping execution.
19:45:51,015 [INFO] 
19:45:51,015 [INFO] --- maven-resources-plugin:3.3.1:testResources (default-testResources) @ helidon-tests-functional-request-scope ---
19:45:51,016 [INFO] Copying 1 resource from src/test/resources to target/test-classes
19:45:51,017 [INFO] 
19:45:51,017 [INFO] --- maven-compiler-plugin:3.11.0:testCompile (default-testCompile) @ helidon-tests-functional-request-scope ---
19:45:51,019 [INFO] Changes detected - recompiling the module! :dependency
19:45:51,019 [INFO] Compiling 1 source file with javac [debug release 21] to target/test-classes
19:45:51,112 [INFO] 
19:45:51,112 [INFO] --- maven-surefire-plugin:3.1.0:test (default-test) @ helidon-tests-functional-request-scope ---
19:45:51,115 [INFO] Surefire report directory: /home/runner/work/helidon/helidon/tests/functional/request-scope/target/surefire-reports
19:45:51,115 [INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
19:45:51,122 [INFO] 
19:45:51,122 [INFO] -------------------------------------------------------
19:45:51,122 [INFO]  T E S T S
19:45:51,122 [INFO] -------------------------------------------------------
19:45:51,633 [INFO] Running io.helidon.tests.functional.requestscope.TenantTest
Error: The operation was canceled.
jbescos commented 1 month ago

Adding thread dumps for the record when issue happens, although it does not show the issue: dump.txt

The issue is only happening in TenantTest#test2. It seems that is invoking one method Bean2#test that is supposed to fail and execute a fallbackMethod.

tomas-langer commented 1 month ago

In my code, this was the relevant part of thread dump I could find:

'   - parking to wait for  <0x00000000c1923010> (a java.util.concurrent.CompletableFuture$Signaller)'.
'   at java.util.concurrent.locks.LockSupport.park(java.base@21/LockSupport.java:221)'.
'   at java.util.concurrent.CompletableFuture$Signaller.block(java.base@21/CompletableFuture.java:1864)'.
'   at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21/ForkJoinPool.java:3780)'.
'   at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21/ForkJoinPool.java:3725)'.
'   at java.util.concurrent.CompletableFuture.waitingGet(java.base@21/CompletableFuture.java:1898)'.
'   at java.util.concurrent.CompletableFuture.join(java.base@21/CompletableFuture.java:2117)'.
'   at io.helidon.tests.functional.requestscope.TenantTest.asyncCalls(TenantTest.java:65)'.
'   at io.helidon.tests.functional.requestscope.TenantTest.test2(TenantTest.java:50)'.
'   at io.helidon.tests.functional.requestscope.TenantTest$Proxy$_$$_WeldClientProxy.test2(Unknown Source)'.

Except for the listener, this was the only other Helidon code in the stack trace

jbescos commented 1 month ago

In my code, this was the relevant part of thread dump I could find:

' - parking to wait for  <0x00000000c1923010> (a java.util.concurrent.CompletableFuture$Signaller)'.
' at java.util.concurrent.locks.LockSupport.park(java.base@21/LockSupport.java:221)'.
' at java.util.concurrent.CompletableFuture$Signaller.block(java.base@21/CompletableFuture.java:1864)'.
' at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21/ForkJoinPool.java:3780)'.
' at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21/ForkJoinPool.java:3725)'.
' at java.util.concurrent.CompletableFuture.waitingGet(java.base@21/CompletableFuture.java:1898)'.
' at java.util.concurrent.CompletableFuture.join(java.base@21/CompletableFuture.java:2117)'.
' at io.helidon.tests.functional.requestscope.TenantTest.asyncCalls(TenantTest.java:65)'.
' at io.helidon.tests.functional.requestscope.TenantTest.test2(TenantTest.java:50)'.
' at io.helidon.tests.functional.requestscope.TenantTest$Proxy$_$$_WeldClientProxy.test2(Unknown Source)'.

Except for the listener, this was the only other Helidon code in the stack trace

I had the same thing in my thread dump. I guess the fault tolerance has a scheduler that is not running, or it run and it didn't commit the response.