Rolling restarts seem extremely inconsistent

zombocom / puma_worker_killer

Automatically restart Puma cluster workers based on max RAM available

749 stars 77 forks source link

Rolling restarts seem extremely inconsistent #29

Closed swrobel closed 8 years ago

swrobel commented 8 years ago

Been noticing this going back to PWK 0.0.4 & Puma 2.x, but currently running PWK 0.0.5 & Puma 3.0.2

config/puma.rb

preload_app!

threads (ENV['THREADS'] || 2), (ENV['THREADS'] || 2)
workers (ENV['WORKERS'] || 2)

port        ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || ENV['RAILS_ENV'] || 'development'

on_worker_boot do
  ActiveRecord::Base.establish_connection
end

config/initializers/puma_worker_killer.rb

PumaWorkerKiller.enable_rolling_restart if Rails.env.production?

Procfile

web: bundle exec puma -C config/puma.rb

App is on Heroku. I'll get a few restarts in a row, but, as I understand, it should be killing some workers every 60s. Logs from the last 6 hours, filtered for PWK entries:

schneems commented 8 years ago

as I understand, it should be killing some workers every 60s

Default is to restart workers every 12 hours

swrobel commented 8 years ago

OK, am I misreading this somehow?

schneems commented 8 years ago

Every 12 hours we'll start a rolling restart. The rolling restart kills 1 worker then it waits 60 seconds so that worker will come online and be able to service requests, then it kills the next. We do this so you don't lose too much capacity. Also the restart has a little jitter, all of your dynos shouldn't start restarting at the exact same time otherwise you'll lose a lot of throughput all at the same time.

If you look at your different dynos you see that web.1 kills a worker at 16:15 and then again at 16:16. The same happens with dynos 1,2,3,& 4.

schneems commented 8 years ago

The rolling restart reap method is called by AutoReap here https://github.com/schneems/puma_worker_killer/blob/78934476d632813621ab2aef5d253fef0f5609fd/lib/puma_worker_killer.rb#L26-L27

swrobel commented 8 years ago

Great, thanks for clearing that up. I read this as indicating every 6 hours by default. I'd like to update the README to reflect that so you only have to answer this once :)

schneems commented 8 years ago

Whoops i meant every 6 hours by default https://github.com/schneems/puma_worker_killer/blob/78934476d632813621ab2aef5d253fef0f5609fd/lib/puma_worker_killer.rb#L10. Could you send me a documentation PR that calls this default out?