stripe / pg-schema-diff

Go library for diffing Postgres schemas and generating SQL migrations
MIT License
278 stars 20 forks source link

Allow CREATE/DROP INDEX without CONCURRENTLY #81

Open provokateurin opened 6 months ago

provokateurin commented 6 months ago

It is not always desirable to create/drop indexes concurrently, so it should be possible to switch it off.

bplunkett-stripe commented 6 months ago

Hi provokateurin!

In what scenarios would you like to not build an index concurrently? I realize it does consume more CPU than a normal index build. Is that the primary use case?

provokateurin commented 6 months ago

In the use-case we want to run all migrations in a transaction, so the index can't be built concurrently. The other problem is that concurrent indexes are not really safe. You have come back later and check if the index was actually built.

cc @geertjohan

bplunkett-stripe commented 6 months ago

I see!

I'd argue transactional DDL isn't super critical because you can always roll it back (with exceptions of deletes) via plan(oldSchema).

Concurrent index builds can't silently fail, so you'll know if a concurrent index build failed. And to get your database out of a partially migrated state, you can do plan(newSchema) again, or even plan(oldSchema). pg-schema-diff will identify invalid indexes from concurrent index build failures and REINDEX them.

mortenson commented 6 months ago

If this gets changed please make non-concurrency opt-in - concurrent index builds are really important/preferred in production to me. During development and for a small number of rows I can see how the multiple transactions are not great, though.

provokateurin commented 6 months ago

pg-schema-diff will identify invalid indexes from concurrent index build failures and REINDEX them.

We are only using pg-schema-diff for generating the migration statements, so we are not able to use the automatic reindexing.

If this gets changed please make non-concurrency opt-in

Agreed, it should be an option and not remove the existing functionality.

bplunkett-stripe commented 6 months ago

We are only using pg-schema-diff for generating the migration statements, so we are not able to use the automatic reindexing.

As in, you're persisting your migration to your codebase?

If this gets changed please make non-concurrency opt-in - concurrent index builds are really important/preferred in production to me. During development and for a small number of rows I can see how the multiple transactions are not great, though.

Our philosophy is to make migration as online-as-possible, so if we did add this option in, it absolutely would be disabled by default.

I think I'm leaning towards supporting this in the future, but right now, I want to focus on the core of the library:

What do you folks think?

provokateurin commented 6 months ago

As in, you're persisting your migration to your codebase?

Yes, the are saved as a file. The process of generating the migration and applying the migration are not the same. The migration is generated while the changes are developed and they are only applied when a new version is deployed.

bplunkett-stripe commented 6 months ago

Yes, the are saved as a file. The process of generating the migration and applying the migration are not the same. The migration is generated while the changes are developed and they are only applied when a new version is deployed.

Any thoughts on changing this process such that the migration is generated and applied when the version is deployed? This also prevents a proliferation of {x}.sql files in y our repo

provokateurin commented 6 months ago

But we want to store all migrations as sql files, so this is exactly our use-case. I don't think there is much we want to change about it. Anyway, this is getting a bit off topic. Let's just add the option to disable concurrent indexes and we are done :)

bplunkett-stripe commented 6 months ago

Totally understand! Forgive my questions! I'm just trying to understand your use case and poke its edges to see how pg-schema-diff can slot into it.

I think, to rephrase the ask a bit, is generate DDL that can be run entirely in one transaction (as opposed to just getting rid of concurrent index builds). If we do implement this type of behavior, it will be disabled by default.

provokateurin commented 6 months ago

Sorry, I didn't want to come off as rude if that was the case!

Yes, we want to be able to generate DDL that can be run in a transaction. So far we only found the concurrent indexes to be a problem, but maybe more still show up later.

bplunkett-stripe commented 6 months ago

No worries! I suspect you folks might run into issues with the "online [check|not null] constraint" addition and "online index replacement" (assuming no concurrently). I haven't really tested if that works transactionally.