Zero Downtime Deployment

iloveitaly commented 12 years ago

From the customer's perspective, this is very important. I want to be able to deploy multiple times a day without worrying about possible downtime. Depending on hardware, available ram, number of extensions, etc unicorn can take a couple minutes to restart after restart spree is run.

During the minutes when unicorn is down the user gets a 500, which means possible lost sales depending on where the customer is in the purchasing process.

There are many resources out there on zero downtime / rolling deployments:

github's script https://gist.github.com/206253. They also have a post about their deployment process.
great tutorial http://ariejan.net/2011/09/14/lighting-fast-zero-downtime-deployments-with-git-capistrano-nginx-and-unicorn/

The biggest issue with integrating this into the current spree deployment scripts is foreman. Both restart spree and stop spree send SIGTERM to the unicorn process. In order to support rolling restarts USR2 has to be sent to the unicorn process.

Although you could create a wrapper script it presents two problems:

As far as I can tell, you'll have to recreate the unicorn_rails call manually, although there may be some shell script voodoo that I'm not aware of that can pass a parent's arguments + file redirection to child call.
stop spree won't work. Both stop and restart use the same signal, you'll have to kill the process manually.

When I discovered this, I concluded that the best way to do zero downtime restarts was to ditch upstart + foreman. This meant rewriting the deploy.rb process control tasks:

namespace :deploy do
  task :start, :roles => :app do
    run "cd #{current_path} && bundle exec unicorn_rails -c /data/spree/shared/config/unicorn.rb -D -p 5000 -o 127.0.0.1 -E production >> /var/log/spree/web-1.log 2>&1"
  end

  task :stop, :roles => :app do
    run "kill -s QUIT `cat #{shared_path}/pids/unicorn.pid`"
  end

  task :restart, :roles => :app do
    run "kill -s USR2 `cat #{shared_path}/pids/unicorn.pid`"
  end
end

This is working well for me, but I would love to get this integrated with the spree deployment scripts. However, this would require a major departure from the logic currently used for process management. Is there way to use foreman + upstart + unicorn and have rolling restarts?

BDQ commented 12 years ago

@iloveitaly - thanks for all the awesome contributions - you win the prize for the first person other than me to contribute to this project. I owe you a beer.

I'm 100% in favour of zero-downtime deploys but foreman is providing a lot of benefit to deployment service so I'm keen to "make it work".

Recently I've added support for using the .foreman file to control concurrency and log file placement which I think is really beneficial see: https://github.com/spree/deployment_service_puppet/blob/master/modules/spree/templates/dot-foreman.erb

Watch spree-user for a post for me for my ideas of the future of the Deployment Service, I'm keen to get your input.

BDQ commented 12 years ago

I've been looking at the options, I think swapping to bluepill and using a custom exporter for foreman we can pass USR2 to unicorn for a restart (and still manage any other custom services we need).

I had planned on swapping to bluepill for memory / cpu monitoring anyway.

BDQ commented 12 years ago

Zero downtime is now working using Bluepill, documentation to follow. Generated capistrano recipe now includes bluepill commands.

johanb commented 11 years ago

@BDQ is this live ? I ran the update on my current deployment and used the new capistrano recipe, but somehow zero downtime doesn't seem to be working yet.

BDQ commented 11 years ago

@johanb it's alive and working for lots of servers for me. I'd suggest you use it on new servers and not try to update existing ones though. The script doesn't disable the old upstart stuff.

spree / deployment_service_puppet

Zero Downtime Deployment #4