Transaction troubles in v3 with Postgres

bluekeyes commented 7 years ago

In #252, the reasoning for removing automatic transactions in v3 is explained. The suggestion is that users should wrap multiple statements into transactions as necessary, but I'm not sure it's possible to do this correctly, at least with Postgres as currently implemented.

Consider a migration file like this:

BEGIN;
ALTER TABLE projects ADD COLUMN disabled BOOLEAN DEFAULT false;
UPDATE projects SET disabled = true WHERE name LIKE 'legacy-%';
COMMIT;

This works fine, provided all the statements succeed. If any statement fails, Postgres does not automatically rollback the current transaction. As a user, I can't issue the ROLLBACK command because execution of my statements stops as soon as the error occurs.

Then the problem gets worse because Postgres ignores commands issued in a transaction after a failure. When the connection that started the transaction is returned to the pool, there's a good chance it will be reused to release the lock... which will not happen because the command is ignored, e.g.:

(details: pq: column "disabled" of relation "projects" already exists) and pq: current transaction is aborted, commands ignored until end of transaction block in line 0: SELECT pg_advisory_unlock($1)

Now the migration table is locked and there's an invalid connection in the connection pool until the process is terminated. This is probably fine if running from the command line, but could be annoying if running migrations automatically in a server or other process using the library interface.

I suspect this could also prevent the migration from being marked as dirty after the error.

As far as I can tell, there are a few options to get around this:

Stick to single command migrations; this may not always be possible
Run multi-statement migrations without transactions; this may not always be safe
Account for this possibility in your migration protocol and make sure the migration process always exits on errors, closing all connections
Write all migrations in PL/pgSQL, which I think will let you catch and respond to errors; this seems like a significant investment to write "simple" migrations

Is there a way to correctly use transactions with the current Postgres implementation? Does this problem affect any other database backends? Maybe the decision on whether to wrap migrations in transactions should be left to the individual drivers?

mattes commented 7 years ago

Thanks for the detailed write-up. This is a problem.

Do you think an onError function that each driver executes in case of an error would fix this? In case of postgres the func would issue a ROLLBACK.

bluekeyes commented 7 years ago

While it would probably work most of the time, I don't think we can guarantee that the onError function will always rollback the transaction, due to the way database/sql implements connection pooling.

I believe the only way to ensure that multiple commands are executed on the same connection is to use the sql.Tx type. Otherwise, each command acquires a connection from the pool, which could be different from the one used to execute the migration.

What if this was a driver option that could be passed via the postgres.Config struct or as a connection parameter? Some like:

type Config struct {
    MigrationsTable string
    DatabaseName    string
    UseTransactions bool
}

Then the Postgres driver would call db.Begin() in the implementation of Run and handle the commit or rollback automatically.

I guess this could lead to migrations being run without transactions, if a user forgets to provide the option in the connection string when calling the CLI. Maybe that's acceptable?

mattes commented 7 years ago

This is a tough one. https://github.com/mattes/migrate/issues/13 explains why no transaction (sql.Tx) is enforced by the driver.

I guess this could lead to migrations being run without transactions, if a user forgets to provide the option in the connection string when calling the CLI. Maybe that's acceptable?

I don't think this is doable. People use migrate as lib, too. Meaning no user interaction.

Really unsure what the best approach is. Happy to hear more feedback and more thoughts.

bluekeyes commented 7 years ago

I looked through the database/sql API again and I was wrong earlier about sql.Tx being the only way to control connection usage. There's also sql.Conn, which sounds like it does what's needed. Here's a (completely untested) version of the Run method using this and issuing a ROLLBACK on error:

func (p *Postgres) Run(migration io.Reader) error {
    migr, err := ioutil.ReadAll(migration)
    if err != nil {
        return err
    }

    // get connection
    ctx := context.Background()
    c, err := p.db.Conn(ctx)
    if err != nil {
        return err
    }
    defer c.Close()

    // run migration
    query := string(migr[:])
    if _, err := c.ExecContext(ctx, query); err != nil {
        // attempt to rollback the current transaction, if any
        // this generates a warning if no transaction is active, but has no effect otherwise
        // TODO: handle or report a rollback failure
        c.ExecContext(ctx, "ROLLBACK")

        // TODO: cast to postgress error and get line number
        return database.Error{OrigErr: err, Err: "migration failed", Query: migr}
    }

    return nil
}

I think this could also work with your idea of an onError function, as long as the connection is saved in the Postgres struct between calls (and Close is called after all migrations run.)

mattes commented 7 years ago

Go 1.9 might have solved it for us: https://golang.org/doc/go1.9#database/sql

The new DB.Conn method returns the new Conn type representing an exclusive connection to the database from the connection pool. All queries run on a Conn will use the same underlying connection until Conn.Close is called to return the connection to the connection pool.

bluekeyes commented 7 years ago

Ah, I didn't realize this was a new Go 1.9 feature when I posted my last comment. What's the policy for this library on using new language features? Would that require a major version bump?

oherych commented 6 years ago

I catch the same issue in my tests. But when I used separate connection for each migrate and then close them I have stable work

func DatabaseMigrate(migrationsDir string) error {
    db, err := gorm.Open("postgres", config.Connection())
    if err != nil {
        return err
    }

    driver, err := postgres.WithInstance(db.DB(), &postgres.Config{})
    if err != nil {
        return err
    }

    m, err := migrate.NewWithDatabaseInstance("file://"+migrationsDir, config.DbName, driver)
    if err != nil {
        return err
    }

    err = m.Up()
    if err != nil && err == migrate.ErrNoChange {
        return err
    }

    se, de := m.Close()
    if se != nil {
        return se
    }
    if de != nil {
        return de
    }

    return nil
}

mattes / migrate

Transaction troubles in v3 with Postgres #274