quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.35k stars 2.56k forks source link

Pinned thread when selecting an instance from Arc #41314

Open vsevel opened 2 weeks ago

vsevel commented 2 weeks ago

Describe the bug

I detected the following thread pinning:

Thread[#54,ForkJoinPool-1-worker-1,5,CarrierThreads]
java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:183)
java.base/jdk.internal.vm.Continuation.onPinned0(Continuation.java:393)
java.base/java.lang.VirtualThread.park(VirtualThread.java:582)
java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
io.quarkus.arc.impl.LazyValue.get(LazyValue.java:29)
io.quarkus.arc.impl.ComputingCache.computeIfAbsent(ComputingCache.java:69)
io.quarkus.arc.impl.ComputingCache.computeIfAbsent(ComputingCache.java:49)
io.quarkus.arc.impl.ComputingCache.getValue(ComputingCache.java:40)
io.quarkus.arc.impl.ArcContainerImpl.getResolvedBeans(ArcContainerImpl.java:970)
io.quarkus.arc.impl.Instances.resolveBeans(Instances.java:41)
io.quarkus.arc.impl.Instances.resolveBeans(Instances.java:36)
io.quarkus.arc.impl.InstanceImpl.resolve(InstanceImpl.java:339)
io.quarkus.arc.impl.InstanceImpl.<init>(InstanceImpl.java:169)
io.quarkus.arc.impl.InstanceImpl.child(InstanceImpl.java:135)
io.quarkus.arc.impl.InstanceImpl.select(InstanceImpl.java:204)
io.quarkus.arc.impl.InstanceImpl.select(InstanceImpl.java:41)
io.quarkus.arc.impl.ArcCDIProvider$ArcCDI.select(ArcCDIProvider.java:51)
io.smallrye.faulttolerance.CdiFaultToleranceSpi.lazyDependencies(CdiFaultToleranceSpi.java:76)
io.smallrye.faulttolerance.core.apiimpl.FaultToleranceImpl$BuilderImpl.lambda$build$0(FaultToleranceImpl.java:192)
io.smallrye.faulttolerance.core.apiimpl.LazyFaultTolerance.call(LazyFaultTolerance.java:28) <== monitors:1
io.smallrye.faulttolerance.FaultToleranceInterceptor.preconfiguredFlow(FaultToleranceInterceptor.java:206)
io.smallrye.faulttolerance.FaultToleranceInterceptor.intercept(FaultToleranceInterceptor.java:176)
io.smallrye.faulttolerance.FaultToleranceInterceptor_Bean.intercept(Unknown Source)
io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
... business code
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
... business code
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
... business code
io.smallrye.reactive.messaging.providers.AbstractMediator.lambda$invokeBlocking$15(AbstractMediator.java:190)
io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
io.smallrye.mutiny.operators.uni.builders.UniCreateFromDeferredSupplier.subscribe(UniCreateFromDeferredSupplier.java:25)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
io.quarkus.virtual.threads.ContextPreservingExecutorService$ContextPreservingRunnable.run(ContextPreservingExecutorService.java:45)
java.base/java.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run(ThreadPerTaskExecutor.java:314)
java.base/java.lang.VirtualThread.run(VirtualThread.java:309)

we do not believe we are doing any native call in our app. similar issue to https://github.com/quarkusio/quarkus/issues/41313 how can we investigate this?

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

quarkus-bot[bot] commented 2 weeks ago

/cc @Ladicek (arc), @manovotn (arc), @mkouba (arc)

mkouba commented 2 weeks ago

Hm, I thought that java.util.concurrent.locks.ReentrantLock is the recommended way to replace the synchronized blocks in order to avoid pinning :thinking:.

CC @mariofusco @cescoffier @xstefank

vsevel commented 2 weeks ago

I believe so as well. I do not know if there is synchronized or a wait hidden in the stack. would we see it in this stacktrace? I suppose so according to PinnedThreadPrinter.printStackTrace():

    private static void printStackTrace(List<LiveStackFrame> stack,
                                        PrintStream out,
                                        boolean printAll) {
        out.println(Thread.currentThread());
        for (LiveStackFrame frame : stack) {
            var ste = frame.toStackTraceElement();
            int monitorCount = frame.getMonitors().length;
            if (monitorCount > 0) {
                out.format("    %s <== monitors:%d%n", ste, monitorCount);
            } else if (frame.isNativeMethod() || printAll) {
                out.format("    %s%n", ste);
            }
        }
    }
xstefank commented 2 weeks ago

Most likely there is. ReentrantLock is working.

vsevel commented 2 weeks ago

Most likely there is.

do you know how to chase it?

xstefank commented 1 week ago

@vsevel I don't think there is a general way how to chase it. Can you provide me with a reproducer? Just to be sure there are no monitors in your business code? I can't reproduce it.

mariofusco commented 1 week ago

@vsevel I could also give a look at this, but I agree with @xstefank on the fact that we need a reproducer.

vsevel commented 1 week ago

it is unlikely that I will be able to give a reproducer. I am totally fine if at some point we close it with "cannot reproduce". I still wanted to create the issue so that we have a place to provide as much context as possible, and that is something that can be searched by others that experience the same issue.

vsevel commented 1 week ago

a bit more information. I looked at the app that was experiencing that particular issue, with that exact stacktrace. it all happened on the same day. and in 2 occasions I looked at the error occurred just after startup. and what is particularly disturbing is that in these 2 cases, the application experienced multiple exceptions and timeouts, and eventually killed shortly after because of the SmallRye Reactive Messaging - liveness check being down. one time it was because of org.jboss.resteasy.reactive.ClientWebApplicationException: Received: ', status code 401' when invoking ..., and the other time it was {"name":"SmallRye Reactive Messaging - liveness check","status":"DOWN","data":{"x":"[OK]","x":"[KO] - Multiple exceptions caught:\n\t[Exception 0] org.jboss.resteasy.reactive.ClientWebApplicationException: Received: 'Bad Gateway, status code 502' when invoking:. I do not know what is causing the other.

franz1981 commented 1 week ago

I see io.smallrye.faulttolerance.core.apiimpl.LazyFaultTolerance.call(LazyFaultTolerance.java:28) <== monitors:1

Which have a monitor there?