User reported an issue on slack channel where worker logs show the following error:
2019-12-05 08:41:20.384 ERROR 13176 --- [test-domain": 2] c.u.c.i.r.ReplayDecisionTaskHandler : Workflow task failure. startedEventId=18, WorkflowID=d1cb88eb-f4e2-487b-b292-089dd1fc3006, RunID=873ae132-4c6a-4c2b-b8d0-20dba2864fa8. If see continuously the workflow might be stuck.
java.lang.IllegalStateException: id=DecisionId{decisionTarget=ACTIVITY, decisionEventId=5}, transitions=[CREATED, handleDecisionTaskStartedEvent, DECISION_SENT, handleInitiatedEvent, INITIATED, handleInitiatedEvent]
at com.uber.cadence.internal.replay.DecisionStateMachineBase.failStateTransition(DecisionStateMachineBase.java:204)
at com.uber.cadence.internal.replay.DecisionStateMachineBase.handleInitiatedEvent(DecisionStateMachineBase.java:109)
at com.uber.cadence.internal.replay.DecisionsHelper.handleActivityTaskScheduled(DecisionsHelper.java:147)
at com.uber.cadence.internal.replay.ReplayDecider.processEvent(ReplayDecider.java:185)
at com.uber.cadence.internal.replay.ReplayDecider.decideImpl(ReplayDecider.java:423)
at com.uber.cadence.internal.replay.ReplayDecider.decide(ReplayDecider.java:359)
at com.uber.cadence.internal.replay.ReplayDecisionTaskHandler.processDecision(ReplayDecisionTaskHandler.java:135)
at com.uber.cadence.internal.replay.ReplayDecisionTaskHandler.handleDecisionTaskImpl(ReplayDecisionTaskHandler.java:115)
at com.uber.cadence.internal.replay.ReplayDecisionTaskHandler.handleDecisionTask(ReplayDecisionTaskHandler.java:76)
at com.uber.cadence.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:258)
at com.uber.cadence.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:230)
at com.uber.cadence.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:71)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Worker execution history shows 2 decision task failures and both of them are sticky decisions when there are more events between DecisionTaskScheduled and DecisionTaskStarted. My guess is this pattern is tripping up java client implementation for handling of sticky decisions. Both the times workflow execution recovered immediately once the decision was scheduled back on global task list.
User reported an issue on slack channel where worker logs show the following error:
Worker execution history shows 2 decision task failures and both of them are sticky decisions when there are more events between DecisionTaskScheduled and DecisionTaskStarted. My guess is this pattern is tripping up java client implementation for handling of sticky decisions. Both the times workflow execution recovered immediately once the decision was scheduled back on global task list.