holman / ama

Ask @holman anything!
732 stars 278 forks source link

Branch deployments and database migrations. #765

Closed joharohl closed 8 years ago

joharohl commented 8 years ago

Hi,

I just read your "How to deploy software" piece and I really liked it! I am in the process of setting up some deployment automation for a couple of smallish projects and I am trying to incorporate a lot of what you mentioned. There is one scenario concerning branch deployments to production that always bothers me though:

If a branch that is deployed brakes and it involves changes to the database schema (e.g. removing a column), you would deploy master to revert the changes. However, most database migration tools seems to have little support of undoing migrations if the migration specification is not present (which it won't be on master). So if you are unlucky, the database migration brakes master as well.

I can see how you can try to be clever by reverting the migrations of the last deploy before deploying master again, but it can be impossible depending on the migrations and it also adds complexity. Another possibility I guess is to do database migrations in small steps, running code to do handle both versions of the database schema. If each step is reversible without having to undo migrations, it can be verified in production that everything works before doing the next step.

Just wondered what your thought are?

holman commented 8 years ago

I think there's a couple general ideas here:

If you're a small organization with a small site, you can probably just get away with running the down portion of a migration while the broken code is still on production. This is dependent on you not having much data and the extra couple seconds to revert the migration won't matter too much in the long run. This is definitely not ideal, but if you're running around with your hair on fire trying to build everything at once (as you do in a small company), it might be an okay tradeoff.

I think at a certain point, though, it becomes important to have a real process around all this. If your database is large enough where rollbacks are really gnarly to do, you should start thinking of pulling the migration out of the branch you're working on.

Basically, deploy the migration on a single branch alone. Probably for most migrations you're going to be doing fairly straightforward migrations: add a column, remove a column, etc. (For situations when you have to transition the data from one form to another, well, that's another whole question.) For these simple migrations, deploy them to production first, and then outside of the normal deploy process (likely when you have the least amount of traffic) run the migration. Hopefully it won't impact production- if it does, then you're going to have to have another branch beforehand that can deal with both database schemas at once.

Once you have things migrated over, then you can start deploying the branch you really want to deploy. But it's all predicated on the idea that you're going to need to support both codepaths simultaneously. This can range from "fine" to "real shitty". But at a certain point of scale, I think the extra work is necessary because downtime is binary: you're either up or you're hosed. And taking the extra bit of time to allow for graceful rollbacks is prudent.

(To reduce the need for rollbacks on your new experimental code, you might want to do a process like Scientist or any of its language ports do: basically, verify that the new code is a-okay, and then switch over. Makes the whole deploy process a bit more cautious.)

vemv commented 8 years ago

Thanks for both this reply and the original article.

One thing I think is not mentioned though, is that migrations (and maybe deployments in general) can (and IMO should) be essayed multiple times over a production replica, before actually performing the migration/deployment on production.

In other words: the production DB should be easily/safely dumpable, so migrations can be tried out over real data. This can reveal issues in the migration itself (e.g. non-appliable constraints) or in the code that depends on the new migrations.

Often staging DBs aren't actual replicas of their production counterparts. I advocate some sort of "DB dump" automated infrastructure to make this facility a no-brainer.

When you've essayed migrations a few times over production replicas, the actual migration/deployment will have significantly less risk and will be much more boring! :)

joharohl commented 8 years ago

Thanks for both your comments, they help my scrambled brain at the moment. What we currently use is pretty close to what @vemv is suggesting. We have a staging server that for each deploy gets a copy of the production database which migrations are applied to. This practice has been working ok so far to safeguard against breaking stuff. It's just been quite manual to deploy to production and I have been wanting to automate things more. I have been trying to wrap my head around this for a while now. So both the article and your comments had most perfect timing!