metabase / metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
https://metabase.com
Other
38.86k stars 5.16k forks source link

I/O error during migration does not cleanly abort migration #50212

Open devurandom opened 3 days ago

devurandom commented 3 days ago

Describe the bug

A fresh instance on v1.51.2-X01 encountered "org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend." on its first start. On subsequent starts it would fail with "Database has migration lock; cannot run migrations."

To Reproduce

I did not reproduce it, but expect the following to show the issue:

  1. Start Metabase v51.2
  2. While it is running migrations, induce network errors, causing a migration to fail
  3. Wait for Metabase to shut down
  4. Start it again
  5. Observe startup failure and "Database has migration lock; cannot run migrations."

Expected behavior

Upon encountering I/O errors during migrations, either:

Logs

First exception:

liquibase.exception.DatabaseException: An I/O error occurred while sending to the backend. [Failed SQL: (0) ALTER TABLE query_action ADD CONSTRAINT pk_query_action PRIMARY KEY (action_id)]
    at liquibase.executor.jvm.JdbcExecutor$ExecuteStatementCallback.doInStatement(JdbcExecutor.java:470)
    at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:77)
    at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:179)
    at liquibase.executor.AbstractExecutor.execute(AbstractExecutor.java:141)
    at liquibase.database.AbstractJdbcDatabase.executeStatements(AbstractJdbcDatabase.java:1285)
    at liquibase.changelog.ChangeSet.execute(ChangeSet.java:755)
    at liquibase.changelog.visitor.UpdateVisitor.executeAcceptedChange(UpdateVisitor.java:119)
    at liquibase.changelog.visitor.UpdateVisitor.visit(UpdateVisitor.java:68)
    at liquibase.changelog.ChangeLogIterator$2.lambda$run$0(ChangeLogIterator.java:133)
    at liquibase.Scope.lambda$child$0(Scope.java:186)
    at liquibase.Scope.child(Scope.java:195)
    at liquibase.Scope.child(Scope.java:185)
    at liquibase.Scope.child(Scope.java:164)
    at liquibase.changelog.ChangeLogIterator$2.run(ChangeLogIterator.java:122)
    at liquibase.Scope.lambda$child$0(Scope.java:186)
    at liquibase.Scope.child(Scope.java:195)
    at liquibase.Scope.child(Scope.java:185)
    at liquibase.Scope.child(Scope.java:164)
    at liquibase.Scope.child(Scope.java:252)
    at liquibase.Scope.child(Scope.java:256)
    at liquibase.changelog.ChangeLogIterator.run(ChangeLogIterator.java:89)
    at liquibase.command.core.AbstractUpdateCommandStep.lambda$run$0(AbstractUpdateCommandStep.java:110)
    at liquibase.Scope.lambda$child$0(Scope.java:186)
    at liquibase.Scope.child(Scope.java:195)
    at liquibase.Scope.child(Scope.java:185)
    at liquibase.Scope.child(Scope.java:164)
    at liquibase.command.core.AbstractUpdateCommandStep.run(AbstractUpdateCommandStep.java:108)
    at liquibase.command.core.UpdateCommandStep.run(UpdateCommandStep.java:105)
    at liquibase.command.CommandScope.execute(CommandScope.java:217)
    at liquibase.Liquibase.lambda$update$0(Liquibase.java:245)
    at liquibase.Scope.lambda$child$0(Scope.java:186)
    at liquibase.Scope.child(Scope.java:195)
    at liquibase.Scope.child(Scope.java:185)
    at liquibase.Scope.child(Scope.java:164)
    at liquibase.Liquibase.runInScope(Liquibase.java:1419)
    at liquibase.Liquibase.update(Liquibase.java:234)
    at liquibase.Liquibase.update(Liquibase.java:212)
    at liquibase.Liquibase.update(Liquibase.java:194)
    at metabase.db.liquibase$migrate_up_if_needed_BANG_$fn__45488.invoke(liquibase.clj:371)
    at metabase.db.liquibase$run_in_scope_locked$reify__45484.run(liquibase.clj:336)
    at liquibase.Scope.lambda$child$0(Scope.java:186)
    at liquibase.Scope.child(Scope.java:195)
    at liquibase.Scope.child(Scope.java:185)
    at liquibase.Scope.child(Scope.java:164)
    at metabase.db.liquibase$run_in_scope_locked.invokeStatic(liquibase.clj:329)
    at metabase.db.liquibase$run_in_scope_locked.invoke(liquibase.clj:312)
    at metabase.db.liquibase$migrate_up_if_needed_BANG_.invokeStatic(liquibase.clj:360)
    at metabase.db.liquibase$migrate_up_if_needed_BANG_.invoke(liquibase.clj:353)
    at metabase.db.setup$migrate_BANG_55501__55502$fn__55503.invoke(setup.clj:84)
    at metabase.db.liquibase$do_with_liquibase45418__45419$f_STAR___45420.invoke(liquibase.clj:135)
    at metabase.db.liquibase$do_with_liquibase45418__45419.invokeStatic(liquibase.clj:138)
    at metabase.db.liquibase$do_with_liquibase45418__45419.invoke(liquibase.clj:126)
    at metabase.db.setup$migrate_BANG_55501__55502.invokeStatic(setup.clj:73)
    at metabase.db.setup$migrate_BANG_55501__55502.doInvoke(setup.clj:55)
    at clojure.lang.RestFn.invoke(RestFn.java:428)
    at metabase.db.setup$run_schema_migrations_BANG_55528__55529.invokeStatic(setup.clj:149)
    at metabase.db.setup$run_schema_migrations_BANG_55528__55529.invoke(setup.clj:144)
    at metabase.db.setup$setup_db_BANG_55535__55536$fn__55539$fn__55540.invoke(setup.clj:167)
    at metabase.util.jvm$do_with_us_locale.invokeStatic(jvm.clj:238)
    at metabase.util.jvm$do_with_us_locale.invoke(jvm.clj:224)
    at metabase.db.setup$setup_db_BANG_55535__55536$fn__55539.invoke(setup.clj:161)
    at metabase.db.setup$setup_db_BANG_55535__55536.invokeStatic(setup.clj:160)
    at metabase.db.setup$setup_db_BANG_55535__55536.invoke(setup.clj:153)
    at metabase.db$setup_db_BANG_$fn__55564.invoke(db.clj:86)
    at metabase.db$setup_db_BANG_.invokeStatic(db.clj:81)
    at metabase.db$setup_db_BANG_.doInvoke(db.clj:68)
    at clojure.lang.RestFn.invoke(RestFn.java:424)
    at metabase.core$init_BANG__STAR_.invokeStatic(core.clj:121)
    at metabase.core$init_BANG__STAR_.invoke(core.clj:102)
    at metabase.core$init_BANG_.invokeStatic(core.clj:181)
    at metabase.core$init_BANG_.invoke(core.clj:176)
    at metabase.core$start_normally.invokeStatic(core.clj:193)
    at metabase.core$start_normally.invoke(core.clj:187)
    at metabase.core$entrypoint.invokeStatic(core.clj:226)
    at metabase.core$entrypoint.doInvoke(core.clj:220)
    at clojure.lang.RestFn.invoke(RestFn.java:400)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.RestFn.applyTo(RestFn.java:135)
    at clojure.lang.Var.applyTo(Var.java:707)
    at clojure.core$apply.invokeStatic(core.clj:667)
    at clojure.core$apply.invoke(core.clj:662)
    at metabase.bootstrap$_main.invokeStatic(bootstrap.clj:31)
    at metabase.bootstrap$_main.doInvoke(bootstrap.clj:28)
    at clojure.lang.RestFn.invoke(RestFn.java:400)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.RestFn.applyTo(RestFn.java:135)
    at metabase.bootstrap.main(Unknown Source)
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:398)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:502)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:419)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:341)
    at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:326)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:302)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:297)
    at com.mchange.v2.c3p0.impl.NewProxyStatement.execute(NewProxyStatement.java:75)
    at liquibase.executor.jvm.JdbcExecutor$ExecuteStatementCallback.doInStatement(JdbcExecutor.java:464)
    ... 86 more
Caused by: java.io.EOFException
    at org.postgresql.core.PGStream.receiveChar(PGStream.java:469)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2166)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:371)
    ... 94 more

Then, on the same run:

liquibase.exception.LockException: liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: This connection has been closed.
    at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:407)
    at liquibase.command.core.AbstractUpdateCommandStep.run(AbstractUpdateCommandStep.java:139)
    at liquibase.command.core.UpdateCommandStep.run(UpdateCommandStep.java:105)
    at liquibase.command.CommandScope.execute(CommandScope.java:217)
    at liquibase.Liquibase.lambda$update$0(Liquibase.java:245)
    at liquibase.Scope.lambda$child$0(Scope.java:186)
    at liquibase.Scope.child(Scope.java:195)
    at liquibase.Scope.child(Scope.java:185)
    at liquibase.Scope.child(Scope.java:164)
    at liquibase.Liquibase.runInScope(Liquibase.java:1419)
    at liquibase.Liquibase.update(Liquibase.java:234)
    at liquibase.Liquibase.update(Liquibase.java:212)
    at liquibase.Liquibase.update(Liquibase.java:194)
    at metabase.db.liquibase$migrate_up_if_needed_BANG_$fn__45488.invoke(liquibase.clj:371)
    at metabase.db.liquibase$run_in_scope_locked$reify__45484.run(liquibase.clj:336)
    at liquibase.Scope.lambda$child$0(Scope.java:186)
    at liquibase.Scope.child(Scope.java:195)
    at liquibase.Scope.child(Scope.java:185)
    at liquibase.Scope.child(Scope.java:164)
    at metabase.db.liquibase$run_in_scope_locked.invokeStatic(liquibase.clj:329)
    at metabase.db.liquibase$run_in_scope_locked.invoke(liquibase.clj:312)
    at metabase.db.liquibase$migrate_up_if_needed_BANG_.invokeStatic(liquibase.clj:360)
    at metabase.db.liquibase$migrate_up_if_needed_BANG_.invoke(liquibase.clj:353)
    at metabase.db.setup$migrate_BANG_55501__55502$fn__55503.invoke(setup.clj:84)
    at metabase.db.liquibase$do_with_liquibase45418__45419$f_STAR___45420.invoke(liquibase.clj:135)
    at metabase.db.liquibase$do_with_liquibase45418__45419.invokeStatic(liquibase.clj:138)
    at metabase.db.liquibase$do_with_liquibase45418__45419.invoke(liquibase.clj:126)
    at metabase.db.setup$migrate_BANG_55501__55502.invokeStatic(setup.clj:73)
    at metabase.db.setup$migrate_BANG_55501__55502.doInvoke(setup.clj:55)
    at clojure.lang.RestFn.invoke(RestFn.java:428)
    at metabase.db.setup$run_schema_migrations_BANG_55528__55529.invokeStatic(setup.clj:149)
    at metabase.db.setup$run_schema_migrations_BANG_55528__55529.invoke(setup.clj:144)
    at metabase.db.setup$setup_db_BANG_55535__55536$fn__55539$fn__55540.invoke(setup.clj:167)
    at metabase.util.jvm$do_with_us_locale.invokeStatic(jvm.clj:238)
    at metabase.util.jvm$do_with_us_locale.invoke(jvm.clj:224)
    at metabase.db.setup$setup_db_BANG_55535__55536$fn__55539.invoke(setup.clj:161)
    at metabase.db.setup$setup_db_BANG_55535__55536.invokeStatic(setup.clj:160)
    at metabase.db.setup$setup_db_BANG_55535__55536.invoke(setup.clj:153)
    at metabase.db$setup_db_BANG_$fn__55564.invoke(db.clj:86)
    at metabase.db$setup_db_BANG_.invokeStatic(db.clj:81)
    at metabase.db$setup_db_BANG_.doInvoke(db.clj:68)
    at clojure.lang.RestFn.invoke(RestFn.java:424)
    at metabase.core$init_BANG__STAR_.invokeStatic(core.clj:121)
    at metabase.core$init_BANG__STAR_.invoke(core.clj:102)
    at metabase.core$init_BANG_.invokeStatic(core.clj:181)
    at metabase.core$init_BANG_.invoke(core.clj:176)
    at metabase.core$start_normally.invokeStatic(core.clj:193)
    at metabase.core$start_normally.invoke(core.clj:187)
    at metabase.core$entrypoint.invokeStatic(core.clj:226)
    at metabase.core$entrypoint.doInvoke(core.clj:220)
    at clojure.lang.RestFn.invoke(RestFn.java:400)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.RestFn.applyTo(RestFn.java:135)
    at clojure.lang.Var.applyTo(Var.java:707)
    at clojure.core$apply.invokeStatic(core.clj:667)
    at clojure.core$apply.invoke(core.clj:662)
    at metabase.bootstrap$_main.invokeStatic(bootstrap.clj:31)
    at metabase.bootstrap$_main.doInvoke(bootstrap.clj:28)
    at clojure.lang.RestFn.invoke(RestFn.java:400)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.RestFn.applyTo(RestFn.java:135)
    at metabase.bootstrap.main(Unknown Source)
Caused by: liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: This connection has been closed.
    at liquibase.database.jvm.JdbcConnection.rollback(JdbcConnection.java:464)
    at liquibase.database.AbstractJdbcDatabase.rollback(AbstractJdbcDatabase.java:1203)
    at liquibase.database.core.PostgresDatabase.rollback(PostgresDatabase.java:401)
    at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:370)
    ... 61 more
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:1009)
    at org.postgresql.jdbc.PgConnection.getAutoCommit(PgConnection.java:969)
    at com.mchange.v2.c3p0.impl.NewProxyConnection.getAutoCommit(NewProxyConnection.java:1227)
    at liquibase.database.jvm.JdbcConnection.rollback(JdbcConnection.java:460)
    ... 64 more

On next start:

liquibase.exception.LockException: Database has migration lock; cannot run migrations. You can force-release these locks by running `java -jar metabase.jar migrate release-locks`.
    at metabase.db.liquibase$wait_for_migration_lock$fn__45449.invoke(liquibase.clj:233)
    at metabase.util.jvm$do_with_auto_retries.invokeStatic(jvm.clj:174)
    at metabase.util.jvm$do_with_auto_retries.invoke(jvm.clj:164)
    at metabase.util.jvm$do_with_auto_retries.invokeStatic(jvm.clj:181)
    at metabase.util.jvm$do_with_auto_retries.invoke(jvm.clj:164)
    at metabase.util.jvm$do_with_auto_retries.invokeStatic(jvm.clj:181)
    at metabase.util.jvm$do_with_auto_retries.invoke(jvm.clj:164)
    at metabase.util.jvm$do_with_auto_retries.invokeStatic(jvm.clj:181)
    at metabase.util.jvm$do_with_auto_retries.invoke(jvm.clj:164)
    at metabase.util.jvm$do_with_auto_retries.invokeStatic(jvm.clj:181)
    at metabase.util.jvm$do_with_auto_retries.invoke(jvm.clj:164)
    at metabase.util.jvm$do_with_auto_retries.invokeStatic(jvm.clj:181)
    at metabase.util.jvm$do_with_auto_retries.invoke(jvm.clj:164)
    at metabase.db.liquibase$wait_for_migration_lock.invokeStatic(liquibase.clj:227)
    at metabase.db.liquibase$wait_for_migration_lock.invoke(liquibase.clj:222)
    at metabase.db.liquibase$run_in_scope_locked$reify__45484.run(liquibase.clj:333)
    at liquibase.Scope.lambda$child$0(Scope.java:186)
    at liquibase.Scope.child(Scope.java:195)
    at liquibase.Scope.child(Scope.java:185)
    at liquibase.Scope.child(Scope.java:164)
    at metabase.db.liquibase$run_in_scope_locked.invokeStatic(liquibase.clj:329)
    at metabase.db.liquibase$run_in_scope_locked.invoke(liquibase.clj:312)
    at metabase.db.liquibase$consolidate_liquibase_changesets_BANG_45562__45563.invokeStatic(liquibase.clj:464)
    at metabase.db.liquibase$consolidate_liquibase_changesets_BANG_45562__45563.invoke(liquibase.clj:443)
    at metabase.db.setup$migrate_BANG_55501__55502$fn__55503.invoke(setup.clj:80)
    at metabase.db.liquibase$do_with_liquibase45418__45419$f_STAR___45420.invoke(liquibase.clj:135)
    at metabase.db.liquibase$do_with_liquibase45418__45419.invokeStatic(liquibase.clj:138)
    at metabase.db.liquibase$do_with_liquibase45418__45419.invoke(liquibase.clj:126)
    at metabase.db.setup$migrate_BANG_55501__55502.invokeStatic(setup.clj:73)
    at metabase.db.setup$migrate_BANG_55501__55502.doInvoke(setup.clj:55)
    at clojure.lang.RestFn.invoke(RestFn.java:428)
    at metabase.db.setup$run_schema_migrations_BANG_55528__55529.invokeStatic(setup.clj:149)
    at metabase.db.setup$run_schema_migrations_BANG_55528__55529.invoke(setup.clj:144)
    at metabase.db.setup$setup_db_BANG_55535__55536$fn__55539$fn__55540.invoke(setup.clj:167)
    at metabase.util.jvm$do_with_us_locale.invokeStatic(jvm.clj:238)
    at metabase.util.jvm$do_with_us_locale.invoke(jvm.clj:224)
    at metabase.db.setup$setup_db_BANG_55535__55536$fn__55539.invoke(setup.clj:161)
    at metabase.db.setup$setup_db_BANG_55535__55536.invokeStatic(setup.clj:160)
    at metabase.db.setup$setup_db_BANG_55535__55536.invoke(setup.clj:153)
    at metabase.db$setup_db_BANG_$fn__55564.invoke(db.clj:86)
    at metabase.db$setup_db_BANG_.invokeStatic(db.clj:81)
    at metabase.db$setup_db_BANG_.doInvoke(db.clj:68)
    at clojure.lang.RestFn.invoke(RestFn.java:424)
    at metabase.core$init_BANG__STAR_.invokeStatic(core.clj:121)
    at metabase.core$init_BANG__STAR_.invoke(core.clj:102)
    at metabase.core$init_BANG_.invokeStatic(core.clj:181)
    at metabase.core$init_BANG_.invoke(core.clj:176)
    at metabase.core$start_normally.invokeStatic(core.clj:193)
    at metabase.core$start_normally.invoke(core.clj:187)
    at metabase.core$entrypoint.invokeStatic(core.clj:226)
    at metabase.core$entrypoint.doInvoke(core.clj:220)
    at clojure.lang.RestFn.invoke(RestFn.java:400)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.RestFn.applyTo(RestFn.java:135)
    at clojure.lang.Var.applyTo(Var.java:707)
    at clojure.core$apply.invokeStatic(core.clj:667)
    at clojure.core$apply.invoke(core.clj:662)
    at metabase.bootstrap$_main.invokeStatic(bootstrap.clj:31)
    at metabase.bootstrap$_main.doInvoke(bootstrap.clj:28)
    at clojure.lang.RestFn.invoke(RestFn.java:400)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.RestFn.applyTo(RestFn.java:135)
    at metabase.bootstrap.main(Unknown Source)

Information about your Metabase installation

Metabase v1.51.2 on Linux

Severity

can cause downtime, but easy workaround

Additional context

https://metaboat.slack.com/archives/C013VC33N95/p1732025675282919

Similar to: https://github.com/metabase/metabase/issues/37866

Different issue, not the problem I ran into, but similar cause: https://github.com/metabase/metabase/issues/30360

crisptrutski commented 3 days ago

I'm nervous about automatically releasing the locks - we don't know what's currently running on the database, and we don't want to risk corrupting it.