Closed markhalonen closed 1 year ago
Most likely you have a long running transaction that this transaction is waiting on, yes.
a standard index build locks out writes (but not reads) on the table until it's done. -- https://www.postgresql.org/docs/15/sql-createindex.html
Although most forms of ADD table_constraint require an ACCESS EXCLUSIVE lock, ADD FOREIGN KEY requires only a SHARE ROW EXCLUSIVE lock. Note that ADD FOREIGN KEY also acquires a SHARE ROW EXCLUSIVE lock on the referenced table, in addition to the lock on the table on which the constraint is declared. -- https://www.postgresql.org/docs/15/sql-altertable.html
Here are some key lines from your migration:
-- create index: expensive
create index if not exists work_order_archived_by_user_id_idx
on steelhead.work_order(archived_by_user_id);
-- create unique index: expensive
alter table steelhead.user add constraint user_id_domain_id_unique
unique (id, domain_id);
-- Add foreign key - needs to check every single row in the table matches an
-- equivalent row on the other side - expensive
alter table steelhead.work_order add constraint work_order_archived_by_user_id_domain_id_fkey
foreign key (archived_by_user_id, domain_id) references steelhead.user(id, domain_id);
-- Another index - expensive
create index if not exists received_order_archived_by_user_id_idx
on steelhead.received_order(archived_by_user_id);
-- Another FK, expensive
alter table steelhead.received_order add constraint received_order_archived_by_user_id_domain_id_fkey
foreign key (archived_by_user_id, domain_id) references steelhead.user(id, domain_id);
Creating indexes takes time, and foreign keys take significant time to check too. You might want to create the indexes CONCURRENTLY
in one migration, wait for the indexes to complete, and then run the next migration. Alternatively, you might find that putting an explicit LOCK
statement at the top of your migration would solve the "hang":
LOCK work_order, user, received_order IN SHARE ROW EXCLUSIVE MODE;
Ok great, thank you for the insight! The real answer here is probably a way to replay production traffic in a test environment. Will close. 👍
Hey Benjie, just looking for places to look on something that we haven't been able to reproduce.
Twice now on production with <10k requests per minute, we run an innocent migration such as
and the migration hangs, and it stops all Postgres operations, and we have to restart our db and customers are impacted, etc.
Any ideas on what could be causing something like this? Long-running transactions or something?
Thanks for any ideas!