Support magic "skip transaction" comment to support `CREATE INDEX CONCURRENTLY`

nickmyatt commented 2 weeks ago

Thanks for the great work on pgmigrate; I find it very useful.

Do you think it would be possible to support a magic comment string at the top of individual migration files to force that particular migration script to run outside a transaction? This would allow us to use CREATE INDEX CONCURRENTLY in our migration scripts.

Our current index maintenance workflow is otherwise rather error-prone as it relies on us recording the change without the CONCURRENTLY keyword, manually adding the index with the CONCURRENTLY keyword, and then finally marking the migration as applied using pgmigrate ops.

I appreciate that we might not want "non transactional" migrations to run automatically, as they could be harmful to the database, so perhaps pgmigrate should refuse to apply any non-transactional migrations by default unless some kind of --allow-non-transactional flag is passed to pgmigrate apply?

peterldowns commented 1 week ago

Hi Nick, glad you're finding pgmigrate useful and I appreciate the feature request. I recently discussed this approach in an HN thread — I'd like to excerpt what I wrote to explain why pgmgirate doesn't currently support this:

Long-running index creation is a problem for pgmigrate and anyone else doing “on-app-startup” or “before-app-deploys” migrations.

Even at moderate scale (normal webapp stuff, not megaco size) building indexes can take a long time — especially for the tables where it’s most important to have indexes.

But if you’re doing long-running index building in your migrations step, you can’t deploy a new version of your app until the migration step finishes. (Big problem for lots of reasons.)

I'll expand on this to describe two more problems with running CREATE INDEX CONCURRENTLY automatically on application startup:

Integrity: if pgmigrate added a "turn off transactions" mode, it would significantly complicate the pgmigrate model, and introduce the possibility of the transactions table not matching database state. This integrity violation would make using pgmigrate more difficult. I have previously used migration frameworks that did not require transactions, and this kind of bug (where a migration is partially applied, or not applied, and the migration framework has it marked as fully applied / fully not applied) is really difficult to diagnose and fix. I do not want to head down this path of complicated failure.
CREATE INDEX CONCURRENTLY is fundamentally a manual command. By this I mean that it can fail, does fail quite frequently in practice when run against busy tables, and when it does it requires manual intervention to fix. I do not believe that CREATE INDEX CONCURRENTLY is suitable to run as an automatic migration at application startup. Please make sure to click through that link to the postgresql docs and understand the myriad failure cases, all of which require manual intervention to resolve — simply retrying the migration (creating the index) doesn't work.

For these reasons, I recommend doing the following (quoted from the HN thread):

The way I’ve dealt with this in the past is:

the database connection used to perform migrations has a low statement timeout of 10seconds.

a long-running index creation statement gets its own migration file and is written as: “CREATE INDEX … IF NOT EXISTS”. This definition does not include the “CONCURRENTLY” directive. When migrations run on a local dev server or during tests, the table being indexed is small so this happens quickly.

Manually, before merging the migration in and deploying so that it’s applied in production, you open a psql terminal to prod and run “CREATE INDEX … CONCURRENTLY”. This may take a long time; it can even fail and need to be retried after hours of waiting. Eventually, it’s complete.

Merge your migration and deploy your app. The “CREATE INDEX … IF NOT EXISTS” migration runs and immediately succeeds because the index exists.

This is finicky and annoying, but if you're an advanced enough postgres user to be intentionally using CREATE INDEX CONCURRENTLY, and you're advanced enough to be able to recover manually if/when that index creation fails, I believe this is a reasonable amount of process to follow.

Does that make sense? I am open to changing my mind, but introducing this sort of complexity into pgmigrate has significant downsides and I am currently biased against it. Since there is a way to perform concurrent index creation with pgmigrate, albeit unweildy, I don't view this as a particularly urgent problem and I'd like to make sure that if we make any changes the downsides are fully understood.

nickmyatt commented 1 week ago

Hi Peter, thanks for the thoughtful response. The IF NOT EXISTS workaround is a good suggestion and I think we will adopt that.

FWIW we aren't using the "migrate-on-deploy" approach, but nevertheless I can see how giving up on the simple transactional model would introduce a lot of undesirable complexity. What I like most about pgmigrate is the simplicity, and I respect your stance in keeping it so.

All the best,

Nick

peterldowns commented 1 week ago

Thanks for the kind words and for following up. In case you hadn't already considered it, if you would prefer more control over the migration strategy you should take a look at flyway — other than pgmigrate I think it's the best migration tool out there :)

If you have the time, I'd really like to know how you're using pgmigrate if not as "migrate-on-deploy" (via docker container or at app startup). I haven't heard from any users and until your comment I thought I was the only one relying on it!

peterldowns / pgmigrate

Support magic "skip transaction" comment to support `CREATE INDEX CONCURRENTLY` #8