Preparing and using a seed file for running tests and for local development

Thanks for all of the great tooling. Copying this discord message over here for visibility.

Overview

Here is a workflow for creating and managing a seed script for test databases and local development. We wanted a script to dump data from production into a file (which we called seed.sql), and to use it for local development while ensuring it's helpful for testing in CI.

What's most important for us is that the setups for local, test and production are as close as possible, to reduce the risks of deployment. At the same time we wanted to make it easy to update our seed whenever we add reference data. We have a lot of different health/wellness datasets. So we only need to add --table flags to add more data to the seed.

Testing

Our seed.sql file currently sits at ~72 Mb. We pushed it using Git LFS to ensure that the CI pipeline would pull it and then it could be used to seed a vanilla test database for each pipeline run.

Local development

Locally we can bootstrap the database with psql -d $DATABASE_NAME -f seed.sql.

For our seed script we were running into issues with applying the SQL file directly with graphile-migrate. It seemed to be failing on COPY statements, some of our data also has semicolons in text fields and it may not have been properly escaped by pg_dump so we wrapped psql and it works well, it's just a little noisy with setval output.

The added benefit of this was that we gained flexibility in the workflow, for instance we could add a step to remove old data and ensure that this only runs consistently during tests. We didn't want it to slow down development in watch mode.

dump-db-data.sh:

```bash #!/bin/bash # # Dump database data. This is used to create seed files from a database. # # Requires variables: # $PGHOST # $PGPORT # $PGDATABASE # $PGUSER # $PGPASSWORD # pg_dump \ --no-owner \ --no-privileges \ --data-only \ --file=migrations/seed.sql \ --table="app_public.some_table" ```

seed-db.js:

```javascript #!/usr/bin/env node const { runMain, runSync } = require('../../../scripts/_setup_utils'); if (process.env.IN_TESTS !== '1') { process.exit(0); } const connectionString = process.env.GM_DBURL; if (!connectionString) { console.error( 'This script should only be called from a graphile-migrate action.', ); process.exit(1); } runMain(async () => { console.info('Clear the existing database tables'); // Clear the database, should the data exist already runSync('psql', [ '-q', '-f', `${__dirname}/../migrations/clear-seed.sql`, connectionString, ]); console.info('Start seeding'); // Seed the database runSync('psql', [ '-q', '-f', `${__dirname}/../migrations/seed.sql`, connectionString, ]); console.info('Seeding complete'); }); ```

.gmrc:

```jsonc "afterCurrent": [ ... { "_": "command", "shadow": true, // NOTE: this script does nothing unless envvar `IN_TESTS` is `1` "command": "node scripts/seed-db.js" } ], ```

graphile / migrate