quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.82k stars 2.69k forks source link

ARJUNA012095: Abort of action id <ID> invoked while multiple threads active within it. #38913

Open parasjain27031994 opened 8 months ago

parasjain27031994 commented 8 months ago

Describe the bug

I have a Quarkus Application deployed as a Native executable on AWS Lambda. For some executions, the lambda time out after being in hung states for 900 seconds.

From the logs I see below traces from transaction manager

ARJUNA012117: TransactionReaper::check processing TX 0:ffffa9fe3dd5:ac33:65d4db9c:6 in state  RUN

ARJUNA012095: Abort of action id 0:ffffa9fe3dd5:ac33:65d4db9c:6 invoked while multiple threads active within it.

ARJUNA012381: Action id 0:ffffa9fe3dd5:ac33:65d4db9c:6 completed with multiple threads - thread Lambda Thread (NORMAL) was in progress with java.base@17.0.9/sun.nio.ch.SocketDispatcher.read0(Native Method)
java.base@17.0.9/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:47)
java.base@17.0.9/sun.nio.ch.NioSocketImpl.tryRead(NioSocketImpl.java:266)
java.base@17.0.9/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:317)
java.base@17.0.9/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
java.base@17.0.9/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
java.base@17.0.9/java.net.Socket$SocketInputStream.read(Socket.java:966)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:1197)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:1183)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:1197)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:1183)
java.base@17.0.9/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:484)
java.base@17.0.9/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:478)
java.base@17.0.9/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)
java.base@17.0.9/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1465)
java.base@17.0.9/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1069)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:1197)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:1183)
com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2155)
com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6830)
com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:8075)
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:646)
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:567)
com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7675)
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4137)
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:272)
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:246)
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:485)
io.agroal.pool.wrapper.PreparedStatementWrapper.executeQuery(PreparedStatementWrapper.java:78)
org.hibernate.sql.results.jdbc.internal.DeferredResultSetAccess.executeQuery(DeferredResultSetAccess.java:239)
org.hibernate.sql.results.jdbc.internal.DeferredResultSetAccess.getResultSet(DeferredResultSetAccess.java:163)
org.hibernate.sql.results.jdbc.internal.JdbcValuesResultSetImpl.advanceNext(JdbcValuesResultSetImpl.java:254)
org.hibernate.sql.results.jdbc.internal.JdbcValuesResultSetImpl.processNext(JdbcValuesResultSetImpl.java:134)
org.hibernate.sql.results.jdbc.internal.AbstractJdbcValues.next(AbstractJdbcValues.java:19)
org.hibernate.sql.results.internal.RowProcessingStateStandardImpl.next(RowProcessingStateStandardImpl.java:66)
org.hibernate.sql.results.spi.ListResultsConsumer.consume(ListResultsConsumer.java:198)
org.hibernate.sql.results.spi.ListResultsConsumer.consume(ListResultsConsumer.java:33)
org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.doExecuteQuery(JdbcSelectExecutorStandardImpl.java:361)
org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.executeQuery(JdbcSelectExecutorStandardImpl.java:168)
org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.list(JdbcSelectExecutorStandardImpl.java:93)
org.hibernate.sql.exec.spi.JdbcSelectExecutor.list(JdbcSelectExecutor.java:31)
org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.lambda$new$0(ConcreteSqmSelectQueryPlan.java:110)
org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.withCacheableSqmInterpretation(ConcreteSqmSelectQueryPlan.java:303)
org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.performList(ConcreteSqmSelectQueryPlan.java:244)
org.hibernate.query.sqm.internal.QuerySqmImpl.doList(QuerySqmImpl.java:518)
org.hibernate.query.spi.AbstractSelectionQuery.list(AbstractSelectionQuery.java:367)
org.hibernate.query.Query.getResultList(Query.java:119)
io.quarkus.hibernate.orm.panache.common.runtime.CommonPanacheQueryImpl.list(CommonPanacheQueryImpl.java:280)
io.quarkus.hibernate.orm.panache.runtime.PanacheQueryImpl.list(PanacheQueryImpl.java:149)
abc.lambda.repository.DocReasonRepository_223c867134c84e0af66c676dd61fd7991ccf7a9eImpl.findAllByIdIn(Unknown Source)
abc.lambda.repository.DocReasonRepository_223c867134c84e0af66c676dd61fd7991ccf7a9eImpl_ClientProxy.findAllByIdIn(Unknown Source)
abc.lambda.service.TopicsAndReasonService.getExistingDocReasonEntityMap(TopicsAndReasonService.java:63)
abc.lambda.service.TopicsAndReasonService_Subclass.getExistingDocReasonEntityMap$$superforward(Unknown Source)
abc.lambda.service.TopicsAndReasonService_Subclass$$function$$1.apply(Unknown Source)
io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:73)
io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:62)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:136)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.invokeInOurTx(TransactionalInterceptorBase.java:107)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.doIntercept(TransactionalInterceptorRequired.java:38)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.intercept(TransactionalInterceptorBase.java:61)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired.intercept(TransactionalInterceptorRequired.java:32)
io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorRequired_Bean.intercept(Unknown Source)
io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
abc.lambda.service.TopicsAndReasonService_Subclass.getExistingDocReasonEntityMap(Unknown Source)
abc.lambda.service.TopicsAndReasonService.computeTopicsOrigin(TopicsAndReasonService.java:30)
abc.lambda.service.TopicsAndReasonService_ClientProxy.computeTopicsOrigin(Unknown Source)
abc.lambda.service.InternalTagProcessor.doInternalTagging(InternalTagProcessor.java:48)
abc.lambda.service.InternalTagProcessor_ClientProxy.doInternalTagging(Unknown Source)
abc.lambda.Handler.handleRequest(Handler.java:53)
abc.lambda.Handler.handleRequest(Handler.java:25)
io.quarkus.amazon.lambda.runtime.AmazonLambdaRecorder$1.processRequest(AmazonLambdaRecorder.java:167)
io.quarkus.amazon.lambda.runtime.AbstractLambdaPollLoop$1.run(AbstractLambdaPollLoop.java:142)
java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:807)
org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:210)
ARJUNA012108: CheckedAction::check - atomic action 0:ffffa9fe3dd5:ac33:65d4db9c:6 aborting with 1 threads active!
Datasource '<default>': JDBC resources leaked: 0 ResultSet(s) and 1 Statement(s)
ARJUNA012117: TransactionReaper::check processing TX 0:ffffa9fe3dd5:ac33:65d4db9c:6 in state  CANCEL
ARJUNA012378: ReaperElement appears to be wedged: org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.headers.Pthread.pthread_cond_wait(Pthread.java)
org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixParker.park0(PosixPlatformThreads.java:361)
org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixParker.park(PosixPlatformThreads.java:341)
org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.parkCurrentPlatformOrCarrierThread(PlatformThreads.java:907)
java.base@17.0.9/jdk.internal.misc.Unsafe.park(Unsafe.java:56)
java.base@17.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)
java.base@17.0.9/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base@17.0.9/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4052)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:4268)
com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(SQLServerConnection.java:4474)
io.agroal.pool.ConnectionHandler.transactionRollback(ConnectionHandler.java:368)
io.agroal.narayana.LocalXAResource.rollback(LocalXAResource.java:85)
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelAbort(XAResourceRecord.java:338)
com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:3112)
com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:3091)
com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1709)
com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:102)
com.arjuna.ats.arjuna.AtomicAction.cancel(AtomicAction.java:191)
com.arjuna.ats.arjuna.coordinator.TransactionReaper.doWork(TransactionReaper.java:407)
com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:63)
org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:807)
org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:210)

ARJUNA012117: TransactionReaper::check processing TX 0:ffffa9fe3dd5:ac33:65d4db9c:6 in state  CANCEL_INTERRUPTED

ARJUNA012120: TransactionReaper::check worker Thread[Transaction Reaper Worker 0,5,main] not responding to interrupt when cancelling TX 0:ffffa9fe3dd5:ac33:65d4db9c:6 -- worker marked as zombie and TX scheduled for mark-as-rollback

The above exceptions were printed as warning and resulted in lambda being in hung state and eventually time out. I am trying understand the cause of failure, however I believe the exception should have been thrown rather than being swallowed so the code could handle the exception in reasonable way.

Expected behavior

Exception should be thrown.

Actual behavior

Exception is swallowed.

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

17

Quarkus version or git rev

3.5.2

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

quarkus-bot[bot] commented 8 months ago

/cc @mmusgrov (narayana)

mmusgrov commented 8 months ago

The transaction has timed out and the transaction reaper is trying to roll it back but it cannot do that because the owning thread is blocked in doExecutePreparedStatement. We detect that and report it using the code ARJUNA012381 (thread Lambda Thread was in progress with java.base@17.0.9/sun.nio.ch.SocketDispatcher.read0(Native Method)).

The transaction timeout handing thread is internal to the transaction manager so we cannot report it to the application using an exception.

To fix it you should tune your database operations or extend the transaction timeout.

Possible reasons for long running database operations include: