Open acaloiaro opened 5 months ago
Hi @acaloiaro 👋 , thanks for opening the issue.
- Given a directory full of migrations, run all migrations within. I believe what's required here is to simply run
pgroll start -c <dir>/*
, sorted. The command could perhaps bepgroll bootstrap <dir>
.
This is absolutely a missing feature of pgroll
and something that has to be addressed before a v1
release. Given the two-phase (start, complete) nature of pgroll
migrations, the command would have to start and complete all migrations in <dir>
. The final migration in <dir>
could in theory be started and left incomplete, but this probably wouldn't be useful in a database bootstrap scenario.
Using pgroll start
for this could confusing, given that it has a --complete
flag to control whether the migration should be immediately completed or not. As we'd have to complete migrations when running a <dir>
full of migrations the --complete
flag would become redundant/confusing. Perhaps adding pgroll bootstrap
as you suggest would be the way to go.
- That it govern migration creation. This could be as simple as
pgroll new <name> <dir>
to add new migration files with the correct timestamp format to<dir>
. It may be nice to open up$EDITOR
after file creation.
I'm not entirely convinced of the need for a pgroll new
command. Having pgroll new
create timestamped migration files would be nice, but it's also possible for users to create numbered migration files by hand (01_create_table.json
, 02_add_field.json
etc). This seems to be a common approach for other migration tools.
A nice feature of pgroll new
that might make the command worthwhile would be to stub out the migration JSON using the JSON schema for the operation, so that pgroll new create_table
opens $EDITOR
with a create_table
migration pre-populated with all required and optional fields.
We'd gladly accept and review PRs that implement either of these, especially for point 1.
. pgroll
has extensive tests for the migrations engine itself in pkg/migrations
and pkg/state
but none around the CLI itself. We're generally happy with this approach for now, so the requirement for extra tests would be minimal.
Hi @acaloiaro 👋 , thanks for opening the issue.
- Given a directory full of migrations, run all migrations within. I believe what's required here is to simply run
pgroll start -c <dir>/*
, sorted. The command could perhaps bepgroll bootstrap <dir>
.This is absolutely a missing feature of
pgroll
and something that has to be addressed before av1
release. Given the two-phase (start, complete) nature ofpgroll
migrations, the command would have to start and complete all migrations in<dir>
. The final migration in<dir>
could in theory be started and left incomplete, but this probably wouldn't be useful in a database bootstrap scenario.
I agree. In the DB bootstrap scenario I'm generally standing up a new system with nothing to roll back to, so marking a boostrapped db as "complete" seems sensible by default.
Using
pgroll start
for this could confusing, given that it has a--complete
flag to control whether the migration should be immediately completed or not. As we'd have to complete migrations when running a<dir>
full of migrations the--complete
flag would become redundant/confusing. Perhaps addingpgroll bootstrap
as you suggest would be the way to go.
Yeah I agree, --complete
is probably silly for this use case.
- That it govern migration creation. This could be as simple as
pgroll new <name> <dir>
to add new migration files with the correct timestamp format to<dir>
. It may be nice to open up$EDITOR
after file creation.I'm not entirely convinced of the need for a
pgroll new
command. Havingpgroll new
create timestamped migration files would be nice, but it's also possible for users to create numbered migration files by hand (01_create_table.json
,02_add_field.json
etc). This seems to be a common approach for other migration tools.
Instead of timestamping, perhaps auto-incrementing from the most-recent migration number would be sensible for new
, e.g pgroll new <name> <dir?".">
-> <dir?>/<new_migration_number>_<migration_name>.json
A nice feature of
pgroll new
that might make the command worthwhile would be to stub out the migration JSON using the JSON schema for the operation, so thatpgroll new create_table
opens$EDITOR
with acreate_table
migration pre-populated with all required and optional fields.
That sounds quite handy.
We'd gladly accept and review PRs that implement either of these, especially for point
1.
.pgroll
has extensive tests for the migrations engine itself inpkg/migrations
andpkg/state
but none around the CLI itself. We're generally happy with this approach for now, so the requirement for extra tests would be minimal.
Sounds good. I'll get something in the works for 1.
I'll leave how 2.
gets done up to your discretion. Either way, I'll be implementing a solution for it on my end to solve my immediate needs. I'm happy share what I come up with as a proof-of-concept if you want. It's unlikely to be super robust and ready for the masses, but will be a starting point.
After more hands-on time with pgroll, I think this issue has the potential to be a whole can of worms. The "bootstrap" problem scope is a subset of a more general one, which is running multiple, unapplied migrations in order.
It's possible that bootstrapping is a problem worth solving on its own, and the more general problem of running unapplied migrations can be solved separately. But, I do think it's worth mentioning here that as a user, I pretty quickly ran into the issue of needing to run multiple unapplied migrations sequentially. I'll refer to this as the "MAM" problem below.
The most obvious way that the MAM problem surfaces is when developers work on feature branches in which multiple migrations are required. While pgroll migrations do indeed support multiple operations
, not all operations
may be applied within the same migration. This exacerbates the MAM problem.
At least two types of operations cannot be applied in the same migration with other operations: sql
, and add_index
. I imagine there are others.
This results in the MAM problem surfacing more often than one would expect, e.g. when developers add new tables that necessitate one or more indices, or when postgres extensions need to be enabled, e.g. using sql
operations to CREATE EXTENSION IF NOT EXISTS \"uuid-ossp\"
to enable UUID
types.
This means that many features involving migrations result in multiple migrations, rather than a single migration. With multiple developers building multiple features that easily result in the MAM problem, managing migration releases becomes difficult.
So is the MAM problem can of worms worth solving now, or should this issue focus solely on bootstrapping? Having fewer or no exclusive operations like sql
and add_index
would alleviate some of the MAM problem pain, but the bootstrap scenario remains.
@andrew-farries could you possibly share the xata team's experience managing this in production?
I think this issue should focus solely on the issue of running pgroll
on a directory full of migrations.
The restrictions around multiple operations in a migration operating on the same resources is a separate issue IMO, already tracked in https://github.com/xataio/pgroll/issues/239.
Any solution for lifting the multiple operations restrictions is likely going to orthogonal to this 'bootstrap' issue.
So is the MAM problem can of worms worth solving now, or should this issue focus solely on bootstrapping? Having fewer or no exclusive operations like
sql
andadd_index
would alleviate some of the MAM problem pain, but the bootstrap scenario remains.
FWIW I'm working on relaxing the restrictions in the sql
migration so it can run together with other ops: https://github.com/xataio/pgroll/pull/280
I wonder if having indices as part of the create_table
op would make sense too
I wonder if having indices as part of the create_table op would make sense too
I think this would make sense. It's been raised before as https://github.com/xataio/pgroll/issues/203
Sounds good. I'll get something in the works for
1.
I just wanted to respond because I actually decided not to put any work into this for the time being, and I don't want anyone under the impression that it's a work in progress.
This is not me pooh-poohing on pgroll. I still think pgroll's way of doing migrations is superior. However between the current shortcomings of no easy bootstrap, no way to define complex migrations (#203, #239), and the JSON migration file format that prevents me from using sqlc
without some sort of automated pgdump
step to get my schema as plain SQL, migrations were not feeling very ergonomic.
In my personal time I'm considering forking pgroll, removing the data migration up
portion of migrations, and replacing the JSON schema definition format with plain SQL. I believe these changes will make it feel more ergonomic for me.
I won't be shocked if I return to the mainline pgroll, however. But with the amount of migrations I'm currently authoring, the ergonomics just aren't working for me and the toil factor is too high. I think pgroll will feel a lot more comfortable for established applications with relatively stable data schemas.
FYI: @andrew-farries This is where I ended up on this issue.
Thanks for the feedback on what's pushing you away from using pgroll
. The points you raise are valid and we're in agreement that they need to be addressed before a v1 release.
I hope you can adapt pgroll
to your needs in the meantime and maybe return once these issues are addressed.
I'm confident I'll be back; pgroll has outstanding promise :smile:
Hello! I had both the "bootsrap" and the sqlc
integration issues and, after a bit of tinkering, I came up with the following solution
echo "Generating DB schema for ${service}"
psql -U postgres -c "DROP DATABASE IF EXISTS ${service}_schemas;"
psql -U postgres -c "CREATE DATABASE ${service}_schemas;"
pgroll --postgres-url "postgres://postgres:postgres@0.0.0.0:5432/${service}_schemas?sslmode=disable" init
find . -path "./${service}/model/migrations/*.json" | xargs -I% pgroll --postgres-url "postgres://postgres:postgres@0.0.0.0:5432/${service}_schemas?sslmode=disable" start --complete %
pg_dump -s -x -n public -U postgres ${service}_schemas > ${service}/model/schema.sql
sqlc generate
This is far from ideal, but it works as one would expect. Leaving it here so maybe I can save some time to the next one! :)
Hello! I had both the "bootsrap" and the
sqlc
integration issues and, after a bit of tinkering, I came up with the following solutionecho "Generating DB schema for ${service}" psql -U postgres -c "DROP DATABASE IF EXISTS ${service}_schemas;" psql -U postgres -c "CREATE DATABASE ${service}_schemas;" pgroll --postgres-url "postgres://postgres:postgres@0.0.0.0:5432/${service}_schemas?sslmode=disable" init find . -path "./${service}/model/migrations/*.json" | xargs -I% pgroll --postgres-url "postgres://postgres:postgres@0.0.0.0:5432/${service}_schemas?sslmode=disable" start --complete % pg_dump -s -x -n public -U postgres ${service}_schemas > ${service}/model/schema.sql sqlc generate
This is far from ideal, but it works as one would expect. Leaving it here so maybe I can save some time to the next one! :)
Not bad at all really. Especially if you watch migrations/*.json
with air
or a similar tool, triggering model generation on migration changes.
Coming from the traditional style of migration tools that you're probably familiar with, I've grown accustom to tooling that provides a clear path from an empty database, to one that fully reflects the current schema.
I believe this story starts with timestamped migration names, as schemas must be realized serially. I.e. fields can't be added to
users
untilusers
exists. This way, timestamped migrations in amigrations
directly declare the serialized path to the most current schema. Of course, some migrations can be parallelized since acreate table
cannot interfere with another create table, so long as its migrations doesn't involve foreign keys, but I don't think it's useful to get mired in implementation detail right now.As a user, what I'd like from pgroll is:
pgroll new <name> <dir>
to add new migration files with the correct timestamp format to<dir>
. It may be nice to open up$EDITOR
after file creation.pgroll start -c <dir>/*
, sorted. The command could perhaps bepgroll bootstrap <dir>
I don't believe either of these things present a significant amount of work, but can greatly improve adoption. I'll likely implement (1.) and (2.) in my own fork for personal use. I'd be happy to get feedback on this proposal and open a PR if it's deemed acceptable. I likely won't have time to write tests of acceptable quality, however.