rubenv / sql-migrate

SQL schema migration tool for Go.
MIT License
3.16k stars 269 forks source link

Doing a migrate down when there is a migration “hole” is corrupting the database #252

Open mlaflamm opened 11 months ago

mlaflamm commented 11 months ago

We had this issue a few times in development after merging or switching branches. We had a “hole” in the applied migrations like this

+-------------------------------+---------+------------------------------+
|          MIGRATIONS           | STATUS  |          APPLIED AT          |
+-------------------------------+---------+------------------------------+
| 000001_tenant_table           |         | Aug 29 2023 18:22:03.597 UTC |
| 000002_mission_table          |         | Aug 29 2023 18:22:03.613 UTC |
| 000003_tag_table              |         | Aug 29 2023 18:22:03.650 UTC |
| 000004_audit_table            |         | Aug 29 2023 18:22:03.702 UTC |
| 000005_drop_audit_join_tables |         | Aug 29 2023 18:54:23.635 UTC |
| 000006_missing                | pending |                              |
| 000007_extra                  |         | Aug 30 2023 17:54:49.716 UTC |
+-------------------------------+---------+------------------------------+

We intuitively performed a migrate down 1. Unfortunately the database went to an non-deterministic state. It turned out that ToCatchup has been called and the missing migration applied but not recorded in the migration table.

The migration status after is the following but 000006_missing is actually applied.

+-------------------------------+---------+------------------------------+
|          MIGRATIONS           | STATUS  |          APPLIED AT          |
+-------------------------------+---------+------------------------------+
| 000001_tenant_table           |         | Aug 29 2023 18:22:03.597 UTC |
| 000002_mission_table          |         | Aug 29 2023 18:22:03.613 UTC |
| 000003_tag_table              |         | Aug 29 2023 18:22:03.650 UTC |
| 000004_audit_table            |         | Aug 29 2023 18:22:03.702 UTC |
| 000005_drop_audit_join_tables |         | Aug 29 2023 18:54:23.635 UTC |
| 000006_missing                | pending |                              |
| 000007_extra                  | pending |                              |
+-------------------------------+---------+------------------------------+

This behavior is very bad and hard to diagnose when first encountered. I am not creating this issue to open a debate about the safety of ToCatchup (see #196 for that). I am creating this issue to have the migration status reflect what has happened.

The PR #235 submitted in January is fixing the migration status.