levibostian / ExpressjsBlanky

Blank Express.js project to get up and running FAST.
MIT License
7 stars 0 forks source link

Zero downtime upgrades #52

Open levibostian opened 3 years ago

levibostian commented 3 years ago

My applications currently do not have downtime but that's because I am lucky. I have a small userbase. If database tables are large, these migrations are difficult to do.

I am also working to perform automated deployments as much as possible. Deploy as soon as you deploy to master branch.

Requirements:

ideas

To do the above, I have considered a few ideas.

levibostian commented 3 years ago

Here is an idea on how to check if 2 tables are identical. You should be able to write a script that does this check.

  1. start a loop at 1
  2. start a transaction.
  3. Run the query that the above says on 100 rows (or whatever chunk size you decide)
  4. If rows are different, print all of the differences to a file or something, close the transaction, exit.
  5. If rows are the same, close transaction, repeat. Keep going until you hit the end of the table.
levibostian commented 3 years ago

I have done more thinking and reading. (1) some database operations do not require a complex migration. However, some operations (like adding an index with CONCURRENCY enabled) are safe but still require I/O and CPU load on the postgres server which might result in downtime anyways. I consider the more complex database migration better in the long run then an operation like that.

Also, I learned some big tips when it comes to performing shadow tables and backfilling:

  1. Do not add indexes to the shadow table until the end to make backfilling faster.
  2. You can use postgres triggers instead of changing application code after you create the shadow table to make sure that database changes are happening to both the original table and shadow table.
  3. There are many different ways to perform backfilling. All of them have the same basic idea: perform this operation in chunks, perform this operation from one of your developer's computers running a script. There are many techniques out there on the SQL statements to run 1, 2, 3, 4 Note: if you are running a postgres cluster, make sure that your backfill script accounts for replication lag to prevent issues. Note: This article also gives the suggestion that after you swap your code to reading from your new shadow table, you swap your duplication code to make sure that all queries are still updating both tables. This is to prevent the scenario where you change your app code to reading from the new shadow table, there is a problem, you can quickly swap back to making the application read from the old table, fix the problem, and then try again. Note: This article gives some little suggestions at the bottom section Avoiding downtime.

Overall, I like this multi-step shadow and backfill approach. The big advantages that it offers to me is: it's safe, I can clock out at any step and come back to it tomorrow, rollback on errors, I am in full control and can stop at anytime.

levibostian commented 3 years ago

Great resource for strategies for each type of migration.