xataio / pgroll

PostgreSQL zero-downtime migrations made easy
https://www.xata.io
Apache License 2.0
2.97k stars 54 forks source link

Retry on `lock_timeout` errors #353

Closed andrew-farries closed 4 months ago

andrew-farries commented 4 months ago

Retry statements and transactions that fail due to lock_timeout errors.

DDL operations and backfills are run in a session in which SET lock_timout TO xms' has been set (x defaults to 500 but can be specified with the --lock-timeout parameter). This ensures that a long running query can't cause other queries to queue up behind a DDL operation as it waits to acquire its lock.

The current behaviour if a DDL operation or backfill batch times out when requesting a lock is to fail, forcing the user to retry the migration operation (start, rollback, or complete).

This PR retries individual statements (like the DDL operations run by migration operations) and transactions (used by backfills) if they fail due to a lock_timeout error. The retry uses an exponential backoff with jitter.

Fixes #171