Closed brunomascijp closed 10 months ago
You're running on an older SDK version. I believe this issue was fixed in this PR https://github.com/temporalio/sdk-java/pull/1795. Can you please upgrade to the latest Java SDK release v1.22.0
Will try, thanks!
Closing since this is not an SDK issue, feel free to ask general questions on our forum or community slack
Expected Behavior
Workflow executions should make progress, retrying, failing or successfully completing steps.
Actual Behavior
I have some executions that got stuck for hours after the following exception, and the state was Running on all of them. We restarted all the workers and the orchestrator seemed to be working good.
[Workflow Executor taskQueue="prod", namespace="ns": 77] [] i.temporal.internal.worker.PollerOptions: uncaught exception java.lang.RuntimeException: Failure processing workflow task. WorkflowId=5b38, RunId=5c9cbad8-8a64-4a84-81bd-64d02474a560, Attempt=473 at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:327) at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.wrapFailure(WorkflowWorker.java:188) at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:98) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: io.temporal.internal.statemachines.InternalWorkflowTaskException: Failure handling event 28 of type 'EVENT_TYPE_WORKFLOW_TASK_STARTED' during execution. {WorkflowTaskStartedEventId=28, CurrentStartedEventId=28} at io.temporal.internal.statemachines.WorkflowStateMachines.createEventProcessingException(WorkflowStateMachines.java:257) at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:236) at io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:208) at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.applyServerHistory(ReplayWorkflowRunTaskHandler.java:208) at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:192) at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:147) at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:132) at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:97) at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:336) at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:246) at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:188) at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93) ... 3 common frames omitted Caused by: java.lang.RuntimeException: WorkflowTask: failure executing SCHEDULED->WORKFLOW_TASK_STARTED, transition history is [CREATED->WORKFLOW_TASK_SCHEDULED] at io.temporal.internal.statemachines.StateMachine.executeTransition(StateMachine.java:152) at io.temporal.internal.statemachines.StateMachine.handleHistoryEvent(StateMachine.java:102) at io.temporal.internal.statemachines.EntityStateMachineBase.handleEvent(EntityStateMachineBase.java:68) at io.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:277) at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:234) ... 13 common frames omitted Caused by: java.lang.NullPointerException: stackTrace[15] at java.base/java.lang.Throwable.setStackTrace(Throwable.java:879) at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:85) at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93) at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79) at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93) at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79) at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93) at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79) at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93) at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79) at io.temporal.internal.sync.SyncWorkflowContext$ActivityCallback.lambda$invoke$0(SyncWorkflowContext.java:292) at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:102) at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:106) at io.temporal.worker.ActiveThreadReportingExecutor.lambda$submit$0(ActiveThreadReportingExecutor.java:53) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ... 3 common frames omitted
Particularly, all stuck executions are on WorkflowTaskFailed state and, after a few hours waiting, we decided to terminate them:
{ "message": "Failure handling event 25 of type 'EVENT_TYPE_WORKFLOW_TASK_STARTED' during execution. {WorkflowTaskStartedEventId=25, CurrentStartedEventId=25}", "source": "JavaSDK", "stackTrace": "io.temporal.internal.statemachines.WorkflowStateMachines.createEventProcessingException(WorkflowStateMachines.java:257)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:236)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:208)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.applyServerHistory(ReplayWorkflowRunTaskHandler.java:208)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:192)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:147)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:132)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:97)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:336)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:246)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:188)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\njava.base/java.lang.Thread.run(Thread.java:833)\n", "cause": { "message": "WorkflowTask: failure executing SCHEDULED->WORKFLOW_TASK_STARTED, transition history is [CREATED->WORKFLOW_TASK_SCHEDULED]", "source": "JavaSDK", "stackTrace": "io.temporal.internal.statemachines.StateMachine.executeTransition(StateMachine.java:152)\nio.temporal.internal.statemachines.StateMachine.handleHistoryEvent(StateMachine.java:102)\nio.temporal.internal.statemachines.EntityStateMachineBase.handleEvent(EntityStateMachineBase.java:68)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:277)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:234)\nio.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:208)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.applyServerHistory(ReplayWorkflowRunTaskHandler.java:208)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:192)\nio.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:147)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:132)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:97)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:336)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:246)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:188)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\njava.base/java.lang.Thread.run(Thread.java:833)\n", "cause": { "message": "stackTrace[15]", "source": "JavaSDK", "stackTrace": "java.base/java.lang.Throwable.setStackTrace(Throwable.java:879)\nio.temporal.failure.FailureConverter.failureToException(FailureConverter.java:85)\nio.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93)\nio.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79)\nio.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93)\nio.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79)\nio.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93)\nio.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79)\nio.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93)\nio.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79)\nio.temporal.internal.sync.SyncWorkflowContext$ActivityCallback.lambda$invoke$0(SyncWorkflowContext.java:292)\nio.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:102)\nio.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:106)\nio.temporal.worker.ActiveThreadReportingExecutor.lambda$submit$0(ActiveThreadReportingExecutor.java:53)\njava.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)\njava.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\njava.base/java.lang.Thread.run(Thread.java:833)\n", "cause": null, "applicationFailureInfo": { "type": "java.lang.NullPointerException", "nonRetryable": false, "details": null } }, "applicationFailureInfo": { "type": "java.lang.RuntimeException", "nonRetryable": false, "details": null } }, "applicationFailureInfo": { "type": "io.temporal.internal.statemachines.InternalWorkflowTaskException", "nonRetryable": false, "details": null } }
Steps to Reproduce the Problem
Not enough information
Specifications