xataio / pgroll

PostgreSQL zero-downtime migrations made easy
https://www.xata.io
Apache License 2.0
3.94k stars 70 forks source link

Validate a migration file #370

Open divyenduz opened 4 months ago

divyenduz commented 4 months ago

Frontend uses the JSON schema to validate pgroll JSON migrations, however, columns like default should be an SQL expression can only be represented as String in the JSON schema. This makes validating them hard. Maybe pgroll can provide a validate command/API that can be used to see if a JSON migration is correct or not? Frontend can use this to validate before starting the migration and pgroll CLI users too can benefit from it.

As a CLI command, this can be something like: pgroll validate <migrationfile>

divyenduz commented 4 months ago

Related https://github.com/xataio/pgroll/issues/169

ryanslade commented 1 month ago

This feels almost like a dry run where we check things, but don't actually make any changes.

I'm also thinking that in order to properly validate the migration we would need to actually connect to the db. The syntax may be correct for example but fail because the tables don't exist. Another reason to connect to the DB is that I think we probably want to avoid doing any kind of syntax parsing in our code but leave that to Postgres.

@andrew-farries WDYT?

andrew-farries commented 1 month ago

I think #169 is a better description of what we want here; the ability to dry-run a migration to ensure that the up and down SQL works against all values in the target database before starting the migration.

I'm not sure there is a useful middle ground between:

I'm not sure how a dry-run would be implemented though; how would it be possible to ensure that the data migration resulted in values that satisfy any new constraints without actually performing the data migration.

Did you have any thoughts on how dry-run could be implemented @exekias ?

exekias commented 1 month ago

I would scope this issue to strictly validating the JSON against the spec. In the future we could improve this by:

About dry run (again, out of the scope here): In my mind, we can know what are the constraints on a column, so we can apply the up function against current values (without touching the table) and check those values against the constraints (we can leverage postgres to evaluate this).

andrew-farries commented 1 month ago

Re-reading the internal discussion that prompted this issue it seems that what we wanted here was a way to run migration validation as a separate step prior to migration execution.

Currently migrations are validated prior to migration start, but that validation can't be run as a separate step without invoking migration start.

Providing a way via CLI and API to validate a migration independently of starting a migration would allow clients to catch a wider range of errors before running a migration than simple JSON schema validation.