treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

spurious TaskConflictException in WorkflowExecutor #211

Closed danielnorberg closed 8 years ago

danielnorberg commented 8 years ago

I frequently get the below exception and stack trace when running the TdIT.testRunQuery test.

016-08-15 11:02:14 +0900 [ERROR] (main) io.digdag.core.workflow.WorkflowExecutor: Enqueue error, making this task failed: StoredTask{id=2, attemptId=1, upstreams=[], updatedAt=2016-08-15T02:02:14.470Z, retryAt=Optional.absent(), stateParams={"domainKey":"498ab4a2-bda5-4b5c-a006-d9edb15b3cc1"}, parentId=Optional.of(1), fullName=+workflow+run, config={"td>":"query.sql","database":"sample_datasets"}, taskType=TaskType{}, state=ready, stateFlags=TaskStateFlags{INITIAL_TASK}}
java.lang.RuntimeException: io.digdag.spi.TaskConflictException: io.digdag.core.repository.ResourceConflictException: Resource already exists: lock of task id=2 in site id = 0 and queue id=null
    at io.digdag.core.workflow.WorkflowExecutor.lambda$enqueueTask$17(WorkflowExecutor.java:827)
    at io.digdag.core.database.DatabaseSessionStoreManager.lambda$lockTaskIfExists$15(DatabaseSessionStoreManager.java:335)
    at io.digdag.core.database.BasicDatabaseStoreManager.lambda$null$2(BasicDatabaseStoreManager.java:210)
    at org.skife.jdbi.v2.tweak.transactions.LocalTransactionHandler.inTransaction(LocalTransactionHandler.java:183)
    at org.skife.jdbi.v2.BasicHandle.inTransaction(BasicHandle.java:330)
    at io.digdag.core.database.BasicDatabaseStoreManager.lambda$transaction$3(BasicDatabaseStoreManager.java:210)
    at io.digdag.util.RetryExecutor.run(RetryExecutor.java:158)
    at io.digdag.util.RetryExecutor.runInterruptible(RetryExecutor.java:117)
    at io.digdag.core.database.BasicDatabaseStoreManager.transaction(BasicDatabaseStoreManager.java:206)
    at io.digdag.core.database.DatabaseSessionStoreManager.lockTaskIfExists(DatabaseSessionStoreManager.java:329)
    at io.digdag.core.workflow.WorkflowExecutor.enqueueTask(WorkflowExecutor.java:777)
    at io.digdag.core.workflow.WorkflowExecutor.enqueueReadyTasks(WorkflowExecutor.java:770)
    at io.digdag.core.workflow.WorkflowExecutor.runWhile(WorkflowExecutor.java:401)
    at io.digdag.core.workflow.WorkflowExecutor.runUntilDone(WorkflowExecutor.java:352)
    at io.digdag.core.LocalSite.runUntilDone(LocalSite.java:161)
    at io.digdag.cli.Run.run(Run.java:274)
    at io.digdag.cli.Run.run(Run.java:230)
    at io.digdag.cli.Run.main(Run.java:171)
    at io.digdag.cli.Main.cli(Main.java:139)
    at utils.TestUtils.main(TestUtils.java:99)
    at utils.TestUtils.main(TestUtils.java:89)
    at utils.TestUtils.main(TestUtils.java:82)
    at utils.TestUtils.main(TestUtils.java:72)
    at acceptance.td.TdIT.runWorkflow(TdIT.java:321)
    at acceptance.td.TdIT.testRunQuery(TdIT.java:121)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:253)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:84)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: io.digdag.spi.TaskConflictException: io.digdag.core.repository.ResourceConflictException: Resource already exists: lock of task id=2 in site id = 0 and queue id=null
    at io.digdag.core.database.DatabaseTaskQueueServer.enqueueDefaultQueueTask(DatabaseTaskQueueServer.java:136)
    at io.digdag.core.queue.QueueTaskQueueDispatcher.dispatch(QueueTaskQueueDispatcher.java:39)
    at io.digdag.core.workflow.WorkflowExecutor.lambda$enqueueTask$17(WorkflowExecutor.java:814)
    ... 55 common frames omitted
Caused by: io.digdag.core.repository.ResourceConflictException: Resource already exists: lock of task id=2 in site id = 0 and queue id=null
    at io.digdag.core.database.BasicDatabaseStoreManager.catchConflict(BasicDatabaseStoreManager.java:86)
    at io.digdag.core.database.DatabaseTaskQueueServer.lambda$enqueue$3(DatabaseTaskQueueServer.java:164)
    at io.digdag.core.database.BasicDatabaseStoreManager.lambda$transaction$4(BasicDatabaseStoreManager.java:243)
    at io.digdag.util.RetryExecutor.run(RetryExecutor.java:158)
    at io.digdag.util.RetryExecutor.runInterruptible(RetryExecutor.java:117)
    at io.digdag.core.database.BasicDatabaseStoreManager.transaction(BasicDatabaseStoreManager.java:235)
    at io.digdag.core.database.DatabaseTaskQueueServer.enqueue(DatabaseTaskQueueServer.java:163)
    at io.digdag.core.database.DatabaseTaskQueueServer.enqueueDefaultQueueTask(DatabaseTaskQueueServer.java:132)
    ... 57 common frames omitted
frsyuki commented 8 years ago

most likely, the cause is this: https://github.com/treasure-data/digdag/blob/568ebb119f028d3bcce12f6ee348b15398ca1e13/digdag-core/src/main/java/io/digdag/core/workflow/WorkflowExecutor.java#L812-L828