xataio / pgroll

PostgreSQL zero-downtime migrations made easy
https://www.xata.io
Apache License 2.0
3.6k stars 67 forks source link

Can the pgroll tool ensure true Zero Downtime? #388

Closed bryancondor closed 1 month ago

bryancondor commented 1 month ago

I’ve been testing the pgroll tool and noticed that your approach involves creating additional columns on the physical table, which necessitates an ACCESS_EXCLUSIVE lock. If this lock cannot be acquired, could it lead to downtime? How does pgroll manage this scenario to mitigate potential issues?

I really appreciate your help with this—thank you in advance!

exekias commented 1 month ago

Hi @bryancondor,

Most schema operations require locking the table with ACCESS_EXCLUSIVE , there is no way around it. What pgroll does is ensuring that every time such operation is required, the lock is taken for a short time. Adding a new column is quick, so the locking time should be brief. Other operations are not that quick, for instance, modifying a column to change a constraint could potentially lock the table for a long time, as it requires a full scan. pgroll will split this operation in several steps, to ensure that the scan happens out of the operation requiring ACCESS_EXCLUSIVE, and this is the key part to achieving zero downtime.

On top of that, pgroll sets a configurable lock timeout, which defines the max time we want to lock for any operation, if it cannot be done within the defined time limit, the operation will be retried or canceled.

@andrew-farries wrote a good blog post explaining these concepts in depth: https://xata.io/blog/pgroll-internals

exekias commented 1 month ago

Hope the explanation solved your doubts, please reopen if that's not the case!