Closed mac-chaffee closed 3 months ago
But if you have two instances of a phoenix app (for high availability), you would run the migrations separate from the app startup process (like in a Kubernetes Job) so you don't encounter race conditions from two different processes trying to apply the same migrations. But now you need to somehow tell the apps to wait for the migrations to complete, or else the new app may start up and try to access a column that doesn't exist yet.
Is this really an issue? I‘m having multiple Phoenix apps running 2+ nodes all trying to apply migrations. Because of the migration lock only one actually applies the migrations, while the others wait for the lock to then realise that migrations are already up. So this approach might be overcomplicating things?
Oh, sure enough you're right: https://github.com/elixir-ecto/ecto_sql/blob/48fc2ad6e8afb022f8454350e23122c3304451d1/lib/ecto/migrator.ex#L399
In order to run migrations, at least two database connections are necessary. One is used to lock the "schema_migrations" table and the other one to effectively run the migrations. This allows multiple nodes to run migrations at the same time, but guarantee that only one of them will effectively migrate the database.
I was trusting this line from the linked article at face value:
if we have 3 replicas, and we try and perform a rolling update then we may end up with multiple applications trying to migrate the database at the same time. This is unsupported in every migration tool I know of, and carries the risk of data corruption.
So using an initContainer for migrations would indeed work in Kubernetes. Thanks!
Problem
When deploying to environments with multiple replicas of a phoenix app (like in Kubernetes), there's a common problem where deploying a new instance of the app requires waiting for migrations to complete. The problem is described in-depth here: https://andrewlock.net/deploying-asp-net-core-applications-to-kubernetes-part-7-running-database-migrations/
To summarize: When you have one instance of a phoenix app, it would be simple to just run
migrate
before the app starts up (like in aninitContainer
). But if you have two instances of a phoenix app (for high availability), you would run the migrations separate from the app startup process (like in a Kubernetes Job) so you don't encounter race conditions from two different processes trying to apply the same migrations. But now you need to somehow tell the apps to wait for the migrations to complete, or else the new app may start up and try to access a column that doesn't exist yet.Solution
This PR adds a
wait_for_migrations
release script that essentially checksmix ecto.migrations
every 5 seconds until it shows all have been applied.You can see an example of this solution in practice here: https://gitlab.com/mac-chaffee/crowdsort/-/tree/master/chart/crowdsort/templates
Questions
wait_for_migrations
should be a part of Ecto and Phoenix just calls their function? Or maybe we define the function deeper inside phoenix so we keeprelease.ex
minimal?Open to all feedback!