pgmigrate is a modern Postgres migrations CLI and golang library. It is designed for use by high-velocity teams who practice continuous deployment. The goal is to make migrations as simple and reliable as possible.
Please visit the ./example
directory for a working example of
how to use pgmigrate. This example demonstrates:
# install it
brew install peterldowns/tap/pgmigrate
Visit the latest Github release and pick the appropriate binary. Or, click one of the shortcuts here:
# run it
nix run github:peterldowns/pgmigrate -- --help
# install it
nix profile install --refresh github:peterldowns/pgmigrate
The prebuilt docker container is ghcr.io/peterldowns/pgmigrate
and each
version is properly tagged. You may reference this in a kubernetes config
as an init container.
To run the pgmigrate cli:
# The default CMD is "pgmigrate" which just shows the help screen.
docker run -it --rm ghcr.io/peterldowns/pgmigrate:latest
# To actually run migrations, you'll want to make sure the container can access
# your database and migrations directory and specify a command. To access a
# database running on the host, use `host.docker.internal` instead of
# `localhost` in the connection string:
docker run -it --rm \
--volume $(pwd)//migrations:/migrations \
--env PGM_MIGRATIONS=/migrations \
--env PGM_DATABASE='postgresql://postgres:password@host.docker.internal:5433/postgres' \
ghcr.io/peterldowns/pgmigrate:latest \
pgmigrate plan
I recommend installing a different way, since the installed binary will not contain version information.
# run it
go run github.com/peterldowns/pgmigrate/cmd/pgmigrate@latest --help
# install it
go install github.com/peterldowns/pgmigrate/cmd/pgmigrate@latest
pgmigrate reads its configuration from cli flags, environment variables, and a configuration file, in that order.
pgmigrate will look in the following locations for a configuration file:
--configfile <aaa>
, then it reads <aaa>
PGM_CONFIGFILE=<bbb>
, then it reads <bbb>
.pgmigrate.yaml
file,
it reads $(pwd)/.pgmigrate.yaml
.pgmigrate.yaml
file,
it reads $(git_repo_root)/.pgmigrate.yaml
Here's an example configuration file. All keys are optional, an empty file is also a valid configuration.
# connection string to a database to manage
database: "postgres://postgres:password@localhost:5433/postgres"
# path to the folder of migration files. if this is relative,
# it is treated as relative to wherever the "pgmigrate" command
# is invoked, NOT as relative to this config file.
migrations: "./tmp/migrations"
# the name of the table to use for storing migration records. you can give
# this in the form "table" to use your database's default schema, or you can
# give this in the form "schema.table" to explicitly set the schema.
table_name: "custom_schema.custom_table"
# this key configures the "dump" command.
schema:
# the name of the schema to dump, defaults to "public"
name: "public"
# the file to which to write the dump, defaults to "-" (stdout)
# if this is relative, it is treated as relative to wherever the
# "pgmigrate" command is invoked, NOT as relative to this config file.
file: "./schema.sql"
# any explicit dependencies between database objects that are
# necessary for the dumped schema to apply successfully.
dependencies:
some_view: # depends on
- some_function
- some_table
some_table: # depends on
- another_table
# any tables for which the dump should contain INSERT statements to create
# actual data/rows. this is useful for enums or other tables full of
# ~constants.
data:
- name: "%_enum" # accepts wildcards using SQL query syntax
- name: "my_example_table" # can also be a literal
# if not specified, defaults to "*"
columns:
- "value"
- "comment"
# a valid SQL order clause to use to order the rows in the INSERT
# statement.
order_by: "value asc"
The CLI ships with documentation and examples built in, please see pgmigrate help
and pgmigrate help <command>
for more details.
# pgmigrate --help
Docs: https://github.com/peterldowns/pgmigrate
Usage:
pgmigrate [flags]
pgmigrate [command]
Examples:
# Preview and then apply migrations
pgmigrate plan # Preview which migrations would be applied
pgmigrate migrate # Apply any previously-unapplied migrations
pgmigrate verify # Verify that migrations have been applied correctly
pgmigrate applied # Show all previously-applied migrations
# Dump the current schema to a file
pgmigrate dump --out schema.sql
Migrating:
applied Show all previously-applied migrations
migrate Apply any previously-unapplied migrations
plan Preview which migrations would be applied
verify Verify that migrations have been applied correctly
Operations:
ops Perform manual operations on migration records
version Print the version of this binary
Development:
config Print the current configuration / settings
dump Dump the database schema as a single migration file
help Help about any command
new generate the name of the next migration file based on the current sequence prefix
Flags:
--configfile string [PGM_CONFIGFILE] a path to a configuration file
-d, --database string [PGM_DATABASE] a 'postgres://...' connection string
-h, --help help for pgmigrate
--log-format string [PGM_LOGFORMAT] 'text' or 'json', the log line format (default 'text')
-m, --migrations string [PGM_MIGRATIONS] a path to a directory containing *.sql migrations
--table-name string [PGM_TABLENAME] the table name to use to store migration records (default 'public.pgmigrate_migrations')
-v, --version version for pgmigrate
Use "pgmigrate [command] --help" for more information about a command.
# library
go get github.com/peterldowns/pgmigrate@latest
All of the methods available in the CLI are equivalently named and available in
the library. Please read the cli help with pgmigrate help <command>
or read
the the go.dev docs at pkg.go.dev/github.com/peterldowns/pgmigrate.
pgmigrate has the following invariants, rules, and behavior:
.sql
. The part before the extension is its unique ID.id (text not null)
: the ID of the migrationchecksum (text not null)
: the MD5() hash of the contents of the migration when it was applied.execution_time_in_millis (integer not null)
: how long it took to apply the migration, in milliseconds.applied_at (timestamp with time zone not null)
: the time at which the migration was finished applying and this row was inserted.ls
or sort
.BEGIN
/COMMIT
/ROLLBACK
within your migration files.migrations
table containing its ID.pgmigrate has the following features and benefits:
Your team repository should include a migrations/
directory containing all known migrations.
migrations
├── 0001_cats.sql
├── 0003_dogs.sql
├── 0003_empty.sql
├── 0004_rm_me.sql
Because your migrations are applied in ascending lexicographical order, you
should use a consistent-length numerical prefix for your migration files. This
will mean that when you ls
the directory, you see the migrations in the same
order that they will be applied. Some teams use unix timestamps, others use
integers, it doesn't matter as long as you're consistent.
Add a new migration by creating a new file in your migrations/
directory
ending in .sql
. The usual work flow is:
It is OK for you and another coworker to use the same sequence number. If you both choose the exact same filename, git will prevent you from merging both PRs.
You can do anything you'd like in a migration except for the following limitations:
BEGIN/COMMIT/ROLLBACK
) as pgmigrate will
run each migration inside of a transaction.CREATE INDEX CONCURRENTLY
as this is guaranteed to fail
inside of a transaction.You may be wondering, how is running "any previously unapplied migration" safe? What if there are two PRs that contain conflicting migrations?
For instance let's say two new migrations get created,
0006_aaa_delete_users.sql
, which deletes the users
table0006_bbb_create_houses.sql
, which creates a new houses
table with a foreign key to users
.├── ...
├── 0006_aaa_delete_users.sql
├── 0006_bbb_create_houses.sql
There's no way both of these migrations could be safely applied, and the resulting database state could be different depending on the order!
0006_aaa_delete_users.sql
is merged and applied first, then
0006_bbb_create_houses.sql
is guaranteed to fail because there is no longer
a users
table to reference in the foreign key.0006_bbb_create_houses.sql
is merged and applied first, then
0006_aaa_delete_users.sql
will either fail (because it cannot delete the
users table) or result in the deletion of the houses table as well (in the
case of ON DELETE CASCADE
on the foreign key).You can prevent this conflict at CI-time by using pgmigrate to maintain an up-to-date dump of your database schema. This schema dump will cause a git merge conflict so that only one of the migrations can be merged, and the second will force the developer to update the PR and the migration:
# schema.sql should be checked in to your repository, and CI should enforce that
# it is up to date. The easiest way to do this is to spin up a database, apply
# the migrations, and run the dump command. Then, error if there are any
# changes detected:
pgmigrate dump -o schema.sql
You should also make sure to run a CI check on your main/dev branch that creates a new database and applies all known migrations. This check should block deploying until it succeeds.
Returning to the example of two conflicting migrations being merged, we can see how these guards provide a good developer experience and prevent a broken migration from being deployed:
Lastly, you should expect this situation to happen only rarely. Most teams, even with large numbers of developers working in parallel, coordinate changes to shared tables such that conflicting schema changes are a rare event.
You should run pgmigrate with the latest migrations directory each time you deploy. You can do this by:
pgmigrate.Migrate(...)
when your application startsYour application should fail to start if migrations fail for any reason.
Your application should start successfully if there are verification errors or warnings, but you should treat those errors as a sign there is a difference between the expected database state and the schema as defined by your migration files.
Because pgmigrate uses advisory locks, you can roll out as many new instances of your application as you'd like. Even if multiple instance attempt to run the migrations at once, only one will acquire the lock and apply the migrations. The other instances will wait for it to succeed and then no-op.
Assuming you're running in a modern cloud environment, you're most likely doing rolling deployments where new instances of your application are brought up before old ones are terminated. Therefore, make sure any new migrations will result in a database state that the previous version of your application (which will still be running as migrations are applied) can handle.
At some point, if you have hundreds or thousands of migration files, you may want to replace them with a single migration file that achieves the same thing. You may want this because:
This process will involve manually updating the migrations table of your staging/production databases. Your coworkers will need to recreate their development databases or manually update their migration state with the same commands used in staging/production. Make sure to coordinate carefully with your team and give plenty of heads up beforehand. This should be an infrequent procedure.
Start by replacing your migrations with the output of pgmigrate dump
. This
can be done in a pull request just like any other change.
export PGM_MIGRATIONS="./migrations"
pgmigrate apply
pgmigrate verify
rm migrations/*.sql
pgmigrate dump -o migrations/00001_squash_on_2023_07_02.sql
This "squash" migration does the exact same thing as all the migration files
that it replaced, which is the goal! But before you can deploy and run
migrations, you will need to manually mark this migration as having already been
applied. Otherwise, pgmigrate would attempt to apply it, and that almost
certainly wouldn't work. The commands below use $PROD
to reference the
connection string for the database you are manually modifying, but you will need
to do this on every database for which you manage migrations.
mkdir -p tmp
pgmigrate --database $PROD dump -o tmp/prod-schema.sql
# This should result in no differences being printed. If you see any
# differences, please abort this process.
diff migrations/00001_squash_on_2023_07_02.sql tmp/prod-schema.sql
rm tmp/prod-schema.sql
# DANGER: Removes all migration records from the database
pgmigrate --database $PROD ops mark-unapplied --all
# DANGER: marks all migrations in the directory (only our squash migration in
# this case) as having been applied without actually running the migrations.
pgmigrate --database $PROD ops mark-applied --all
pgmigrate --database $PROD plan
pgmigrate --database $PROD verify
If you're using the pgmigrate
CLI and you see an error like this:
error: hasMigrationsTable: ERROR: prepared statement "stmtcache_19cfd54753d282685a62119ed71c7d6c9a2acfa4aa0d34ad" already exists (SQLSTATE 42P05)
you can fix the issue by adding a parameter to your database connection string to change how pgmigrate
caches statements:
# before
database: "postgresql://user:password@host.provider.com:6543/postgres"
# after
database: "postgresql://user:password@host.provider.com:6543/postgres?default_query_exec_mode=describe_exec"
pgmigrate
uses the on jackc/pgx
library to
connect to Postgres databases. This library defaults to fairly aggressives
statement caching which is unfortunately not compatible with Pgbouncer or other
poolers. If you've seen the error above, you're most likely connecting through a
pooler like Pgbouncer.
The solution is to pass a default_query_exec_mode=exec
connection string parameter, which
jackc/pgx
will use to configure its statement caching behavior. The documentation
and the connection parsing code describe the available
options, but exec
should work by default.
As of v0.1.0, the CLI will automatically add this query parameter for you if you haven't already specified a statement caching mode.
I'd like to thank and acknowledge:
pgmigrate dump
.