Thanks for all of the great tooling. Copying this discord message over here for visibility.
Overview
Here is a workflow for creating and managing a seed script for test databases and local development. We wanted a script to dump data from production into a file (which we called seed.sql), and to use it for local development while ensuring it's helpful for testing in CI.
What's most important for us is that the setups for local, test and production are as close as possible, to reduce the risks of deployment. At the same time we wanted to make it easy to update our seed whenever we add reference data. We have a lot of different health/wellness datasets. So we only need to add --table flags to add more data to the seed.
Testing
Our seed.sql file currently sits at ~72 Mb. We pushed it using Git LFS to ensure that the CI pipeline would pull it and then it could be used to seed a vanilla test database for each pipeline run.
Local development
Locally we can bootstrap the database with psql -d $DATABASE_NAME -f seed.sql.
For our seed script we were running into issues with applying the SQL file directly with graphile-migrate. It seemed to be failing on COPY statements, some of our data also has semicolons in text fields and it may not have been properly escaped by pg_dump so we wrapped psql and it works well, it's just a little noisy with setval output.
The added benefit of this was that we gained flexibility in the workflow, for instance we could add a step to remove old data and ensure that this only runs consistently during tests. We didn't want it to slow down development in watch mode.
dump-db-data.sh:
```bash
#!/bin/bash
#
# Dump database data. This is used to create seed files from a database.
#
# Requires variables:
# $PGHOST
# $PGPORT
# $PGDATABASE
# $PGUSER
# $PGPASSWORD
#
pg_dump \
--no-owner \
--no-privileges \
--data-only \
--file=migrations/seed.sql \
--table="app_public.some_table"
```
seed-db.js:
```javascript
#!/usr/bin/env node
const { runMain, runSync } = require('../../../scripts/_setup_utils');
if (process.env.IN_TESTS !== '1') {
process.exit(0);
}
const connectionString = process.env.GM_DBURL;
if (!connectionString) {
console.error(
'This script should only be called from a graphile-migrate action.',
);
process.exit(1);
}
runMain(async () => {
console.info('Clear the existing database tables');
// Clear the database, should the data exist already
runSync('psql', [
'-q',
'-f',
`${__dirname}/../migrations/clear-seed.sql`,
connectionString,
]);
console.info('Start seeding');
// Seed the database
runSync('psql', [
'-q',
'-f',
`${__dirname}/../migrations/seed.sql`,
connectionString,
]);
console.info('Seeding complete');
});
```
.gmrc:
```jsonc
"afterCurrent": [
...
{
"_": "command",
"shadow": true,
// NOTE: this script does nothing unless envvar `IN_TESTS` is `1`
"command": "node scripts/seed-db.js"
}
],
```
Thanks for all of the great tooling. Copying this discord message over here for visibility.
Overview
Here is a workflow for creating and managing a seed script for test databases and local development. We wanted a script to dump data from production into a file (which we called seed.sql), and to use it for local development while ensuring it's helpful for testing in CI.
What's most important for us is that the setups for local, test and production are as close as possible, to reduce the risks of deployment. At the same time we wanted to make it easy to update our seed whenever we add reference data. We have a lot of different health/wellness datasets. So we only need to add --table flags to add more data to the seed.
Testing
Our seed.sql file currently sits at ~72 Mb. We pushed it using Git LFS to ensure that the CI pipeline would pull it and then it could be used to seed a vanilla test database for each pipeline run.
Local development
Locally we can bootstrap the database with
psql -d $DATABASE_NAME -f seed.sql
.For our seed script we were running into issues with applying the SQL file directly with graphile-migrate. It seemed to be failing on COPY statements, some of our data also has semicolons in text fields and it may not have been properly escaped by pg_dump so we wrapped psql and it works well, it's just a little noisy with setval output.
The added benefit of this was that we gained flexibility in the workflow, for instance we could add a step to remove old data and ensure that this only runs consistently during tests. We didn't want it to slow down development in watch mode.
dump-db-data.sh:
seed-db.js:
.gmrc: