treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.31k stars 223 forks source link

Concurrent update in table "QUEUED_TASK_LOCKS": another transaction has updated or deleted the same row [90131-192] #1784

Open JPWon-1 opened 1 year ago

JPWon-1 commented 1 year ago

image

2023-01-18 15:32:25.540 +0900 [ERROR] (3073@[0:KOP]+RCMD_KEYWORD+RCMD_KEYWORD_RAW) io.digdag.core.agent.OperatorManager: Task failed with unexpected error: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table ; SQL statement:
delete from queued_task_locks where id = ? and lock_agent_id = ? [50200-192] [statement:"delete from queued_task_locks where id = :taskLockId and lock_agent_id = :agentId", located:"delete from queued_task_locks where id = :taskLockId and lock_agent_id = :agentId", rewritten:"delete from queued_task_locks where id = ? and lock_agent_id = ?", arguments:{ positional:{}, named:{agentId:'8@a53e99b9c646',taskLockId:1755432}, finder:[]}]
org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table ; SQL statement:
delete from queued_task_locks where id = ? and lock_agent_id = ? [50200-192] [statement:"delete from queued_task_locks where id = :taskLockId and lock_agent_id = :agentId", located:"delete from queued_task_locks where id = :taskLockId and lock_agent_id = :agentId", rewritten:"delete from queued_task_locks where id = ? and lock_agent_id = ?", arguments:{ positional:{}, named:{agentId:'8@a53e99b9c646',taskLockId:1755432}, finder:[]}]
    at org.skife.jdbi.v2.SQLStatement.internalExecute(SQLStatement.java:1338)
    at org.skife.jdbi.v2.Update.execute(Update.java:56)
    at org.skife.jdbi.v2.sqlobject.UpdateHandler$2.value(UpdateHandler.java:68)
    at org.skife.jdbi.v2.sqlobject.UpdateHandler.invoke(UpdateHandler.java:81)
    at org.skife.jdbi.v2.sqlobject.SqlObject.invoke(SqlObject.java:216)
    at org.skife.jdbi.v2.sqlobject.SqlObject$3.intercept(SqlObject.java:127)
    at org.skife.jdbi.v2.sqlobject.CloseInternalDoNotUseThisClass$$EnhancerByCGLIB$$c25568f0.deleteQueuedTaskLock(<generated>)
    at io.digdag.core.database.DatabaseTaskQueueServer.lambda$deleteTask0$5(DatabaseTaskQueueServer.java:231)
    at io.digdag.core.database.BasicDatabaseStoreManager.transaction(BasicDatabaseStoreManager.java:190)
    at io.digdag.core.database.BasicDatabaseStoreManager.transaction(BasicDatabaseStoreManager.java:178)
    at io.digdag.core.database.DatabaseTaskQueueServer.deleteTask0(DatabaseTaskQueueServer.java:223)
    at io.digdag.core.database.DatabaseTaskQueueServer.deleteTask(DatabaseTaskQueueServer.java:217)
    at io.digdag.core.queue.QueueTaskQueueDispatcher.taskFinished(QueueTaskQueueDispatcher.java:59)
    at io.digdag.core.workflow.WorkflowExecutor.taskSucceeded(WorkflowExecutor.java:1219)
    at io.digdag.core.agent.InProcessTaskCallbackApi.lambda$taskSucceeded$3(InProcessTaskCallbackApi.java:146)
    at io.digdag.core.database.ThreadLocalTransactionManager.begin(ThreadLocalTransactionManager.java:263)
    at io.digdag.core.database.ThreadLocalTransactionManager.begin(ThreadLocalTransactionManager.java:225)
    at io.digdag.core.agent.InProcessTaskCallbackApi.taskSucceeded(InProcessTaskCallbackApi.java:146)
    at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:317)
    at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invokeMain(DigdagTimedMethodInterceptor.java:58)
    at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invoke(DigdagTimedMethodInterceptor.java:31)
    at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:152)
    at io.digdag.core.agent.ExtractArchiveWorkspaceManager.withExtractedArchive(ExtractArchiveWorkspaceManager.java:75)
    at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:150)
    at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invokeMain(DigdagTimedMethodInterceptor.java:58)
    at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invoke(DigdagTimedMethodInterceptor.java:31)
    at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:133)
    at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invokeMain(DigdagTimedMethodInterceptor.java:58)
    at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invoke(DigdagTimedMethodInterceptor.java:31)
    at io.digdag.core.agent.MultiThreadAgent.lambda$null$0(MultiThreadAgent.java:132)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table ; SQL statement:
delete from queued_task_locks where id = ? and lock_agent_id = ? [50200-192]
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
    at org.h2.message.DbException.get(DbException.java:168)
    at org.h2.command.Command.filterConcurrentUpdate(Command.java:307)
    at org.h2.command.Command.executeUpdate(Command.java:260)
    at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:201)
    at org.skife.jdbi.v2.SQLStatement.internalExecute(SQLStatement.java:1327)
    ... 34 common frames omitted
Caused by: org.h2.jdbc.JdbcSQLException: Concurrent update in table "QUEUED_TASK_LOCKS": another transaction has updated or deleted the same row [90131-192]
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
    at org.h2.message.DbException.get(DbException.java:168)
    at org.h2.mvstore.db.MVTable.convertException(MVTable.java:898)
    at org.h2.mvstore.db.MVSecondaryIndex.remove(MVSecondaryIndex.java:248)
    at org.h2.mvstore.db.MVTable.removeRow(MVTable.java:677)
    at org.h2.command.dml.Delete.update(Delete.java:95)
    at org.h2.command.CommandContainer.update(CommandContainer.java:98)
    at org.h2.command.Command.executeUpdate(Command.java:258)
    ... 36 common frames omitted
Caused by: java.lang.IllegalStateException: Entry is locked [1.4.192/101]
    at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:773)
    at org.h2.mvstore.db.TransactionStore$TransactionMap.set(TransactionStore.java:1031)
    at org.h2.mvstore.db.TransactionStore$TransactionMap.remove(TransactionStore.java:989)
    at org.h2.mvstore.db.MVSecondaryIndex.remove(MVSecondaryIndex.java:242)
    ... 40 common frames omitted

Digdag v0.10.4 embulk 0.9.24 h2 1.4.192/101

While I'm trying to ETL with embulk(0.9.24) this error happened sometimes. After job done, digdag trying to remove row in queued_task_locks ( I think this is about managing queue table. ) and then more than one transaction has updated or deleted the same row so that happens.

how should i solve this issue?

hiroyuki-sato commented 1 year ago

Hello, @JPWon-1

I don't know the root cause, but I recommend you use PostgreSQL instead of the H2 database.

JPWon-1 commented 1 year ago

What 's difference use PostgreSQL ? And How to change PostgreSQL instead H2 in Embulk?

hiroyuki-sato commented 1 year ago

Digdag is designed to use PostgreSQL in a production environment. PostgreSQL is the priority database. Have you ever tried digdag download

JPWon-1 commented 1 year ago

No i haven't. Can i get some example for that?

hiroyuki-sato commented 1 year ago

You can download projects using digdag download and push them to a new server. (I don't know h2 to PostgreSQL migration tool)