peterldowns / pgmigrate

a modern Postgres migrations CLI and library
MIT License
64 stars 3 forks source link

🐽 pgmigrate

Latest Version Golang

pgmigrate is a modern Postgres migrations CLI and golang library. It is designed for use by high-velocity teams who practice continuous deployment. The goal is to make migrations as simple and reliable as possible.

Major features

Documentation

Quickstart Example

Please visit the ./example directory for a working example of how to use pgmigrate. This example demonstrates:

CLI

Install

Homebrew:

# install it
brew install peterldowns/tap/pgmigrate

Download a binary:

Visit the latest Github release and pick the appropriate binary. Or, click one of the shortcuts here:

Nix (flakes):

# run it
nix run github:peterldowns/pgmigrate -- --help
# install it
nix profile install --refresh github:peterldowns/pgmigrate

Docker:

The prebuilt docker container is ghcr.io/peterldowns/pgmigrate and each version is properly tagged. You may reference this in a kubernetes config as an init container.

To run the pgmigrate cli:

# The default CMD is "pgmigrate" which just shows the help screen.
docker run -it --rm ghcr.io/peterldowns/pgmigrate:latest
# To actually run migrations, you'll want to make sure the container can access
# your database and migrations directory and specify a command. To access a
# database running on the host, use `host.docker.internal` instead of
# `localhost` in the connection string:
docker run -it --rm \
  --volume $(pwd)//migrations:/migrations \
  --env PGM_MIGRATIONS=/migrations \
  --env PGM_DATABASE='postgresql://postgres:password@host.docker.internal:5433/postgres' \
  ghcr.io/peterldowns/pgmigrate:latest \
  pgmigrate plan

Golang:

I recommend installing a different way, since the installed binary will not contain version information.

# run it
go run github.com/peterldowns/pgmigrate/cmd/pgmigrate@latest --help
# install it
go install github.com/peterldowns/pgmigrate/cmd/pgmigrate@latest

Configuration

pgmigrate reads its configuration from cli flags, environment variables, and a configuration file, in that order.

pgmigrate will look in the following locations for a configuration file:

Here's an example configuration file. All keys are optional, an empty file is also a valid configuration.

# connection string to a database to manage
database: "postgres://postgres:password@localhost:5433/postgres"
# path to the folder of migration files. if this is relative,
# it is treated as relative to wherever the "pgmigrate" command
# is invoked, NOT as relative to this config file.
migrations: "./tmp/migrations"
# the name of the table to use for storing migration records.  you can give
# this in the form "table" to use your database's default schema, or you can
# give this in the form "schema.table" to explicitly set the schema.
table_name: "custom_schema.custom_table"
# this key configures the "dump" command.
schema:
  # the name of the schema to dump, defaults to "public"
  name: "public"
  # the file to which to write the dump, defaults to "-" (stdout)
  # if this is relative, it is treated as relative to wherever the
  # "pgmigrate" command is invoked, NOT as relative to this config file.
  file: "./schema.sql"
  # any explicit dependencies between database objects that are
  # necessary for the dumped schema to apply successfully.
  dependencies:
    some_view: # depends on
      - some_function
      - some_table
    some_table: # depends on
      - another_table
  # any tables for which the dump should contain INSERT statements to create
  # actual data/rows. this is useful for enums or other tables full of
  # ~constants.
  data:
    - name: "%_enum" # accepts wildcards using SQL query syntax
    - name: "my_example_table" # can also be a literal
      # if not specified, defaults to "*"
      columns:
        - "value"
        - "comment"
      # a valid SQL order clause to use to order the rows in the INSERT
      # statement.
      order_by: "value asc"

Usage

The CLI ships with documentation and examples built in, please see pgmigrate help and pgmigrate help <command> for more details.

# pgmigrate --help
Docs: https://github.com/peterldowns/pgmigrate

Usage:
  pgmigrate [flags]
  pgmigrate [command]

Examples:
  # Preview and then apply migrations
  pgmigrate plan     # Preview which migrations would be applied
  pgmigrate migrate  # Apply any previously-unapplied migrations
  pgmigrate verify   # Verify that migrations have been applied correctly
  pgmigrate applied  # Show all previously-applied migrations

  # Dump the current schema to a file
  pgmigrate dump --out schema.sql

Migrating:
  applied     Show all previously-applied migrations
  migrate     Apply any previously-unapplied migrations
  plan        Preview which migrations would be applied
  verify      Verify that migrations have been applied correctly

Operations:
  ops         Perform manual operations on migration records
  version     Print the version of this binary

Development:
  config      Print the current configuration / settings
  dump        Dump the database schema as a single migration file
  help        Help about any command
  new         generate the name of the next migration file based on the current sequence prefix

Flags:
      --configfile string   [PGM_CONFIGFILE] a path to a configuration file
  -d, --database string     [PGM_DATABASE] a 'postgres://...' connection string
  -h, --help                help for pgmigrate
      --log-format string   [PGM_LOGFORMAT] 'text' or 'json', the log line format (default 'text')
  -m, --migrations string   [PGM_MIGRATIONS] a path to a directory containing *.sql migrations
      --table-name string   [PGM_TABLENAME] the table name to use to store migration records (default 'public.pgmigrate_migrations')
  -v, --version             version for pgmigrate

Use "pgmigrate [command] --help" for more information about a command.

Library

Install

# library
go get github.com/peterldowns/pgmigrate@latest

Usage

All of the methods available in the CLI are equivalently named and available in the library. Please read the cli help with pgmigrate help <command> or read the the go.dev docs at pkg.go.dev/github.com/peterldowns/pgmigrate.

FAQ

How does it work?

pgmigrate has the following invariants, rules, and behavior:

Why use pgmigrate instead of the alternatives?

pgmigrate has the following features and benefits:

How should my team work with it?

the migrations directory

Your team repository should include a migrations/ directory containing all known migrations.

migrations
├── 0001_cats.sql
├── 0003_dogs.sql
├── 0003_empty.sql
├── 0004_rm_me.sql

Because your migrations are applied in ascending lexicographical order, you should use a consistent-length numerical prefix for your migration files. This will mean that when you ls the directory, you see the migrations in the same order that they will be applied. Some teams use unix timestamps, others use integers, it doesn't matter as long as you're consistent.

creating a new migration

Add a new migration by creating a new file in your migrations/ directory ending in .sql. The usual work flow is:

It is OK for you and another coworker to use the same sequence number. If you both choose the exact same filename, git will prevent you from merging both PRs.

what's allowed in a migration

You can do anything you'd like in a migration except for the following limitations:

preventing conflicts

You may be wondering, how is running "any previously unapplied migration" safe? What if there are two PRs that contain conflicting migrations?

For instance let's say two new migrations get created,

├── ...
├── 0006_aaa_delete_users.sql
├── 0006_bbb_create_houses.sql

There's no way both of these migrations could be safely applied, and the resulting database state could be different depending on the order!

You can prevent this conflict at CI-time by using pgmigrate to maintain an up-to-date dump of your database schema. This schema dump will cause a git merge conflict so that only one of the migrations can be merged, and the second will force the developer to update the PR and the migration:

# schema.sql should be checked in to your repository, and CI should enforce that
# it is up to date. The easiest way to do this is to spin up a database, apply
# the migrations, and run the dump command.  Then, error if there are any
# changes detected:
pgmigrate dump -o schema.sql

You should also make sure to run a CI check on your main/dev branch that creates a new database and applies all known migrations. This check should block deploying until it succeeds.

Returning to the example of two conflicting migrations being merged, we can see how these guards provide a good developer experience and prevent a broken migration from being deployed:

  1. One of the two migrations is merged. The second branch should not be able to be merged because the dumped schema.sql will contain a merge conflict.
  2. If for some reason both of the migrations are able to be merged, the check on the main/dev branch will fail to apply migrations and block the deploy. because the migrations cannot be applied. Breaking main is annoying, but...

Lastly, you should expect this situation to happen only rarely. Most teams, even with large numbers of developers working in parallel, coordinate changes to shared tables such that conflicting schema changes are a rare event.

deploying and applying migrations

You should run pgmigrate with the latest migrations directory each time you deploy. You can do this by:

Your application should fail to start if migrations fail for any reason.

Your application should start successfully if there are verification errors or warnings, but you should treat those errors as a sign there is a difference between the expected database state and the schema as defined by your migration files.

Because pgmigrate uses advisory locks, you can roll out as many new instances of your application as you'd like. Even if multiple instance attempt to run the migrations at once, only one will acquire the lock and apply the migrations. The other instances will wait for it to succeed and then no-op.

backwards compatibility

Assuming you're running in a modern cloud environment, you're most likely doing rolling deployments where new instances of your application are brought up before old ones are terminated. Therefore, make sure any new migrations will result in a database state that the previous version of your application (which will still be running as migrations are applied) can handle.

squashing migrations

At some point, if you have hundreds or thousands of migration files, you may want to replace them with a single migration file that achieves the same thing. You may want this because:

This process will involve manually updating the migrations table of your staging/production databases. Your coworkers will need to recreate their development databases or manually update their migration state with the same commands used in staging/production. Make sure to coordinate carefully with your team and give plenty of heads up beforehand. This should be an infrequent procedure.

Start by replacing your migrations with the output of pgmigrate dump. This can be done in a pull request just like any other change.

This "squash" migration does the exact same thing as all the migration files that it replaced, which is the goal! But before you can deploy and run migrations, you will need to manually mark this migration as having already been applied. Otherwise, pgmigrate would attempt to apply it, and that almost certainly wouldn't work. The commands below use $PROD to reference the connection string for the database you are manually modifying, but you will need to do this on every database for which you manage migrations.

ERROR: prepared statement "stmtcache_..." already exists (SQLSTATE 42P05)

If you're using the pgmigrate CLI and you see an error like this:

error: hasMigrationsTable: ERROR: prepared statement "stmtcache_19cfd54753d282685a62119ed71c7d6c9a2acfa4aa0d34ad" already exists (SQLSTATE 42P05)

you can fix the issue by adding a parameter to your database connection string to change how pgmigrate caches statements:

# before
database: "postgresql://user:password@host.provider.com:6543/postgres"
# after
database: "postgresql://user:password@host.provider.com:6543/postgres?default_query_exec_mode=describe_exec"

pgmigrate uses the on jackc/pgx library to connect to Postgres databases. This library defaults to fairly aggressives statement caching which is unfortunately not compatible with Pgbouncer or other poolers. If you've seen the error above, you're most likely connecting through a pooler like Pgbouncer.

The solution is to pass a default_query_exec_mode=exec connection string parameter, which jackc/pgx will use to configure its statement caching behavior. The documentation and the connection parsing code describe the available options, but exec should work by default.

As of v0.1.0, the CLI will automatically add this query parameter for you if you haven't already specified a statement caching mode.

Acknowledgements

I'd like to thank and acknowledge: