zombocom / puma_worker_killer

Automatically restart Puma cluster workers based on max RAM available
747 stars 77 forks source link

Cannot kill workers #64

Closed volkanunsal closed 4 years ago

volkanunsal commented 5 years ago

I noticed that PumaWorkerKiller wasn't able to kill the workers after the memory threshold was reached. This is what I saw in the Heroku log, and it kept repeating every 5 seconds.

I, [2019-01-01T09:22:26.184981 #303]  INFO -- : Installing Puma worker loop.
2019-01-01T14:22:26.192087+00:00 app[web.3]: [5] - Worker 0 (pid: 303) booted, phase: 0
2019-01-01T14:22:26.555152+00:00 app[web.2]: [5] PumaWorkerKiller: Out of memory. 2 workers consuming total: 555.26171875 mb out of max: 501.76 mb. Sending TERM to pid 12418 consuming 182.80078125 mb.
2019-01-01T14:22:26.555466+00:00 app[web.2]: Worker #<Puma::Cluster::Worker:0x000055ccb935d578 @index=1, @pid=12418, @phase=0, @stage=:booted, @signal="TERM", @options=#<Puma::UserFileDefaultOptions:0x000055ccaf363d10 @user_options={:binds=>["tcp://0.0.0.0:39732"], :environment=>"production", :logger=>#<Puma::Events:0x000055ccaf36c500 @formatter=#<Puma::Events::PidFormatter:0x000055ccaf491c50>, @stdout=#<IO:<STDOUT>>, @stderr=#<IO:<STDERR>>, @debug=false, @hooks={:on_booted=>[]}>, :before_worker_boot=>[#<Proc:0x000055ccaf5d8eb0@./config/puma.rb:43>, #<Proc:0x000055ccb459b3a8@/app/vendor/bundle/ruby/2.3.0/gems/scout_apm-2.4.13/lib/scout_apm/server_integrations/puma.rb:28>]}, @file_options={:config_files=>["./config/puma.rb"], :min_threads=>16, :max_threads=>16, :binds=>["tcp://0.0.0.0:39732"], :environment=>"production", :workers=>2, :preload_app=>true, :before_worker_boot=>[#<Proc:0x000055ccaf5d8eb0@./config/puma.rb:43>], :before_fork=>[#<Proc:0x000055ccaf5d8e60@./config/puma.rb:75>]}, @default_options={:min_threads=>0, :max_threads=>16, :log_requests=>false, :debug=>false, :binds=>["tcp://0.0.0.0:9292"], :workers=>0, :daemon=>false, :mode=>:http, :worker_timeout=>60, :worker_boot_timeout=>60, :worker_shutdown_timeout=>30, :remote_address=>:socket, :tag=>"app", :environment=>"production", :rackup=>"config.ru", :logger=>#<IO:<STDOUT>>, :persistent_timeout=>20, :first_data_timeout=>30}>, @first_term_sent=nil, @last_checkin=2019-01-01 09:22:21 -0500, @last_status="{}", @dead=false> being killed

And this was the memory consumption metric in the meantime:

screen shot 2019-01-01 at 10 21 35 am

I'm using PumaWorkerKiller version 0.1.0 and Puma 3.12.0.

cesarizu commented 5 years ago

This is happening for me as well with puma_worker_killer = 0.1.1 and puma = 4.1.0

schneems commented 5 years ago

Did this previously work with an older version of puma such as 4.0 or 3.12?

cesarizu commented 4 years ago

I'm unable to reproduce this consistently. It seems that sometimes it does work, but other times it doesn't. I've tried to reduce the possible causes but it has been challenging.

When puma_worker_killer is not able to kill the worker, the application seems to continue running normally, responding to requests from that worker. I'm thinking that maybe there's a thread in the process that got stuck but I haven't been able to check that.

Also looking at the code it seems that puma might be at fault, as puma_worker_killer just calls the right method to kill the worker. This also seems to happen only under memory pressure situations as rolling restarts seem to work just fine.

I'm going to continue to investigate and I'll post here anything that I find.

schneems commented 4 years ago

Any progress here?