lastland / scala-forklift

Type-safe data migration tool for Slick, Git and beyond.
Other
188 stars 31 forks source link

[Help] Running multiple migratedb tool in parallel #49

Open dmourao opened 4 years ago

dmourao commented 4 years ago

Hello,

Recently, we've introduced scala-forklift in our cloud applications, that are running at kubernetes.

Some applications have replicas and this means that migratedb-tool (using forklift) runs (when application is restarted), in parallel, among other replicas to work on same database.

We have some issues trying to run migratedb-tool in parallel - it seams that it lacks support for parallel runs on same database.

There is a way to lock database during migration ?

lastland commented 4 years ago

Hi @dmourao,

Thanks for reporting this issue! It is true that scala-forklift lacks support for parallel runs on the same database. I imagine that the migration tool will only be executed during deployment, not on productions, and only one application is responsible for running the migrations, but that certainly excludes some special scenarios. I wonder what is the reason that you will need all replicas to update the same database? Are they running the same script or are they updating disjoint parts/tables of the same database? Is the motivation that you run migrations in parallel to speed up the migration process or there are other reasons?

dmourao commented 4 years ago

Hi @lastland , and thanks for your response.

In fact, when N replicas start (at the same time), i only want that one of the replicas do the migration process, while other replicas just waits. I need some kind of locking mechanism.

We are using kubernetes for deployment and management of our cloud solution. Kubernetes has the concept of a Pod, where at simple terms, each individual Pod represents an instance of our application. And to achieve horizontal scale of our application, we could have several replicas (lets say N replicas) using same database and a load balancer, at higher level, is responsabile to forward requests to any Pod/replica. This means that my application, in practice, is composed by a set of N replicas/pods of the same codebase.

When kubernetes deploys my application (N replicas), each replica at boot/startup is trying to ensure that database is created or updated, doing the migrate-db process. Each replica is executing the same SQL script. But the problem is: each replica is doing the same process at parallel, causing a race condition at database.

First i was thinking to lock __migrations__ table when the first replica tries to migrate the database, and this ensures that other replicas will wait for the unlock. But i cannot find a easy way to do this. There are other possibilities ?

lastland commented 4 years ago

Locking __migrations__ table could be a solution but I am not clear how to do that easily, either. You will need an atomic operation to acquire and use the lock.

Another possibility is to have a designated replica to do the migrations. If your application is using a master+replicas architecture, then the master node sounds like a perfect place to do this job. Otherwise, maybe you can distinguish one replica from others? IMHO, this may be a better design as this mechanism does not rely on the features of your migration tools or databases---this is good because if you have some other shared resource you would like to set up, you can use the same method without relying on a lock mechanism on that shared resource. However, I may not be making sense as I am just speculating without looking at your application. 😛

dmourao commented 4 years ago

Locking migrations table could be a solution but I am not clear how to do that easily, either. You will need an atomic operation to acquire and use the lock.

Locking table could be done using Postgres command LOCK (table-level lock); or row-level lock (using select for update): https://www.postgresql.org/docs/11/explicit-locking.html

However, to do that, locking primitives must be invoked inside transaction being opened that does the migration (executed by mg process). Before each migration M, i suppose that i can invoke LOCK command and at the end of migration M invoke COMMIT/ROLLBACK (unlock), however this is not a elegant solution. Developer shouldn't care about locking table explicitly.

EDIT2: locking table before each Migration (1.scala, 2.scala, etc), like i said, will not work too. Other replicas will wait for the unlock, but will execute the same Migration code and will fail.