treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

org.h2.jdbc.JdbcSQLException: Table "QUEUED_SHARED_TASK_LOCKS" not found #26

Closed komamitsu closed 8 years ago

komamitsu commented 8 years ago

I ran into this error when running a workflow as a local mode.

2016-04-01 14:25:30 +0900 [ERROR] (local-agent-0): Uncaught exception
org.skife.jdbi.v2.exceptions.UnableToCreateStatementException: org.h2.jdbc.JdbcSQLException: Table "QUEUED_SHARED_TASK_LOCKS" not found; SQL statement:
with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null [42102-191] [statement:"with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null", located:"with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null", rewritten:"with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null", arguments:{ positional:{}, named:{}, finder:[]}]
        at org.skife.jdbi.v2.SQLStatement.internalExecute(SQLStatement.java:1306)
        at org.skife.jdbi.v2.Query.fold(Query.java:173)
        at org.skife.jdbi.v2.Query.list(Query.java:82)
        at org.skife.jdbi.v2.Query.list(Query.java:75)
        at io.digdag.core.database.DatabaseTaskQueueStore.lambda$lockSharedTasks$147(DatabaseTaskQueueStore.java:195)
        at io.digdag.core.database.BasicDatabaseStoreManager.lambda$transaction$54(BasicDatabaseStoreManager.java:192)
        at org.skife.jdbi.v2.tweak.transactions.LocalTransactionHandler.inTransaction(LocalTransactionHandler.java:183)
        at org.skife.jdbi.v2.BasicHandle.inTransaction(BasicHandle.java:330)
        at io.digdag.core.database.BasicDatabaseStoreManager.transaction(BasicDatabaseStoreManager.java:192)
        at io.digdag.core.database.DatabaseTaskQueueStore.lockSharedTasks(DatabaseTaskQueueStore.java:176)
        at io.digdag.core.database.DatabaseTaskQueueFactory$DatabaseTaskQueue.lockSharedTasks(DatabaseTaskQueueFactory.java:87)
        at io.digdag.core.agent.LocalAgent.run(LocalAgent.java:57)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.h2.jdbc.JdbcSQLException: Table "QUEUED_SHARED_TASK_LOCKS" not found; SQL statement:
with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null [42102-191]
        at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
        at org.h2.message.DbException.get(DbException.java:179)
        at org.h2.message.DbException.get(DbException.java:155)
        at org.h2.command.Parser.readTableOrView(Parser.java:5349)
        at org.h2.command.Parser.readTableFilter(Parser.java:1245)
        at org.h2.command.Parser.parseSelectSimpleFromPart(Parser.java:1884)
        at org.h2.command.Parser.parseSelectSimple(Parser.java:2032)
        at org.h2.command.Parser.parseSelectSub(Parser.java:1878)
        at org.h2.command.Parser.parseSelectUnion(Parser.java:1699)
        at org.h2.command.Parser.parseSelectSub(Parser.java:1874)
        at org.h2.command.Parser.parseSelectUnion(Parser.java:1699)
        at org.h2.command.Parser.parseSelect(Parser.java:1687)
        at org.h2.command.Parser.parseWith(Parser.java:4745)
        at org.h2.command.Parser.parsePrepared(Parser.java:479)
        at org.h2.command.Parser.parse(Parser.java:315)
        at org.h2.command.Parser.parse(Parser.java:287)
        at org.h2.command.Parser.prepareCommand(Parser.java:252)
        at org.h2.engine.Session.prepareLocal(Session.java:560)
        at org.h2.engine.Session.prepareCommand(Session.java:501)
        at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1188)
        at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:73)
        at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:276)
        at com.zaxxer.hikari.pool.ProxyConnection.prepareStatement(ProxyConnection.java:308)
        at com.zaxxer.hikari.pool.HikariProxyConnection.prepareStatement(HikariProxyConnection.java)
        at org.skife.jdbi.v2.DefaultStatementBuilder.create(DefaultStatementBuilder.java:54)
        at org.skife.jdbi.v2.SQLStatement.internalExecute(SQLStatement.java:1302)
        ... 16 common frames omitted

This is the workflow:

run: +main

+main:
  +step1:
    sh>: ./tasks/bin/enc-tool -e development-ec2 -a 1 -c ENCRYPT -t 200
frsyuki commented 8 years ago

hm.....I checked code but couldn't find the cause. To mitigate issue and make further investigation possible, BasicDatabaseStoreManager should retry a transaction if it failed with unexpected exception such as org.skife.jdbi.v2.exceptions.UnableToCreateStatementException.

frsyuki commented 8 years ago

I added retrying to all database transactions: afd4590611552c6ff6aa25e4b30d1049e144f60b If this issue is not deterministic, retrying can recover the problem. But otherwise, please reopen this ticket.