treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

LocalAgent exception spam during test executor shutdown #159

Closed danielnorberg closed 8 years ago

danielnorberg commented 8 years ago

@frsyuki Didn't we see/fix this earlier?

...
Gradle Test Executor 4 finished executing tests.
2016-06-30 03:08:57 +0000 [ERROR] (local-agent-0) io.digdag.core.agent.LocalAgent: Uncaught exception
org.skife.jdbi.v2.exceptions.UnableToCreateStatementException: org.h2.jdbc.JdbcSQLException: Table "QUEUED_SHARED_TASK_LOCKS" not found; SQL statement:
with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null [42102-191] [statement:"with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null", located:"with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null", rewritten:"with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null", arguments:{ positional:{}, named:{}, finder:[]}]
    at org.skife.jdbi.v2.SQLStatement.internalExecute(SQLStatement.java:1306)
    at org.skife.jdbi.v2.Query.fold(Query.java:173)
    at org.skife.jdbi.v2.Query.list(Query.java:82)
    at org.skife.jdbi.v2.Query.list(Query.java:75)
    at io.digdag.core.database.DatabaseTaskQueueStore.lambda$lockSharedTasks$4(DatabaseTaskQueueStore.java:195)
    at io.digdag.core.database.BasicDatabaseStoreManager.lambda$null$2(BasicDatabaseStoreManager.java:210)
    at org.skife.jdbi.v2.tweak.transactions.LocalTransactionHandler.inTransaction(LocalTransactionHandler.java:183)
    at org.skife.jdbi.v2.BasicHandle.inTransaction(BasicHandle.java:330)
    at io.digdag.core.database.BasicDatabaseStoreManager.lambda$transaction$3(BasicDatabaseStoreManager.java:210)
    at io.digdag.util.RetryExecutor.run(RetryExecutor.java:158)
    at io.digdag.util.RetryExecutor.runInterruptible(RetryExecutor.java:117)
    at io.digdag.core.database.BasicDatabaseStoreManager.transaction(BasicDatabaseStoreManager.java:206)
    at io.digdag.core.database.DatabaseTaskQueueStore.lockSharedTasks(DatabaseTaskQueueStore.java:176)
    at io.digdag.core.database.DatabaseTaskQueueFactory$DatabaseTaskQueue.lockSharedTasks(DatabaseTaskQueueFactory.java:87)
    at io.digdag.core.agent.LocalAgent.run(LocalAgent.java:66)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.h2.jdbc.JdbcSQLException: Table "QUEUED_SHARED_TASK_LOCKS" not found; SQL statement:
with recursive t (queue_id) as ((select queue_id from queued_shared_task_locks where hold_expire_time is null order by queue_id limit 1) union all select (select queue_id from queued_shared_task_locks where hold_expire_time is null and queue_id > t.queue_id order by queue_id limit 1) from t where t.queue_id is not null) select queue_id as id from t where queue_id is not null [42102-191]
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
    at org.h2.message.DbException.get(DbException.java:179)
    at org.h2.message.DbException.get(DbException.java:155)
    at org.h2.command.Parser.readTableOrView(Parser.java:5349)
    at org.h2.command.Parser.readTableFilter(Parser.java:1245)
    at org.h2.command.Parser.parseSelectSimpleFromPart(Parser.java:1884)
    at org.h2.command.Parser.parseSelectSimple(Parser.java:2032)
    at org.h2.command.Parser.parseSelectSub(Parser.java:1878)
    at org.h2.command.Parser.parseSelectUnion(Parser.java:1699)
    at org.h2.command.Parser.parseSelectSub(Parser.java:1874)
    at org.h2.command.Parser.parseSelectUnion(Parser.java:1699)
    at org.h2.command.Parser.parseSelect(Parser.java:1687)
    at org.h2.command.Parser.parseWith(Parser.java:4745)
    at org.h2.command.Parser.parsePrepared(Parser.java:479)
    at org.h2.command.Parser.parse(Parser.java:315)
    at org.h2.command.Parser.parse(Parser.java:287)
    at org.h2.command.Parser.prepareCommand(Parser.java:252)
    at org.h2.engine.Session.prepareLocal(Session.java:560)
    at org.h2.engine.Session.prepareCommand(Session.java:501)
    at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1188)
    at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:73)
    at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:276)
    at org.skife.jdbi.v2.DefaultStatementBuilder.create(DefaultStatementBuilder.java:54)
    at org.skife.jdbi.v2.SQLStatement.internalExecute(SQLStatement.java:1302)
...
frsyuki commented 8 years ago

I think it still happens because LocalAgent is not stopped explicitly. Guice closes H2 database at PreDestroy but local agent's thread could access to H2 later. Previously, this error happened even more often because H2 was closed before PreDestroy.

danielnorberg commented 8 years ago

Yeah, I think there's some TODO's in the code wrt LocalAgent shutdown.