zombocom / puma_worker_killer

Automatically restart Puma cluster workers based on max RAM available
748 stars 77 forks source link

Significantly incorrect memory usage reported #6

Closed travisp closed 4 years ago

travisp commented 10 years ago

Was having issues with frequent cycling on a heroku project and noticed numbers like this in the logs (shortly after I restarted all of the web dynos)

2014-08-15T14:21:17.222622+00:00 app[web.3]: [2] PumaWorkerKiller: Consuming 429.48046875 mb with master and 2 workers 2014-08-15T14:21:18.443592+00:00 heroku[web.3]: source=web.3 dyno=heroku.15698018.ca54adb5-3f63-4006-ae9f-c0f235c53288 sample#load_avg_1m=0.00 sample#load_avg_5m=0.00 2014-08-15T14:21:18.443937+00:00 heroku[web.3]: source=web.3 dyno=heroku.15698018.ca54adb5-3f63-4006-ae9f-c0f235c53288 sample#memory_total=313.75MB sample#memory_rss=313.75MB sample#memory_cache=0.00MB sa

Basically, it seems to be vastly overestimating the amount of memory actually used. This is using the latest code from master, including get_process_mem 0.2.0

schneems commented 10 years ago

this is why https://github.com/schneems/get_process_mem/issues/7

samnang commented 9 years ago

@schneems because right now Puma is the recommended web server on heroku, so is it reliable to use it now? or better not and wait until this issue solve?

schneems commented 9 years ago

You can use this now, but it's at your own risk. Unicorn Worker Killer has the exact same bug and people have been using it on Heroku for years. I'll remove the warning from the readme when this gets resolved in a sane way. Feel free to experiment until then.

samnang commented 9 years ago

Thank @schneems, I will try it out. I'm a bit confused here between puma_worker_killer vs https://github.com/schneems/puma_auto_tune, we shouldn't use both at the same time? which one is your recommendation?

schneems commented 9 years ago

Puma Auto Tune does everything that PWK does + some. It will also cause your app to swap memory if you use it on Heroku guaranteed. Don't use Puma Auto Tune on Heroku right now. PWK is fine, but understand that it's not perfect, i.e. if you set your app to 512mb of RAM, it will start killing workers at about ~350mb of RAM. PWK outputs how much ram it thinks your system is using in a librato compatible format to the logs, you can manually compare that against actual RAM usage from Heroku's log runtime metrics and adjust.

samnang commented 9 years ago

if you set your app to 512mb of RAM, it will start killing workers at about ~350mb of RAM.

That's what I got here, and it keeps killing worker and restarting it. Does it make sense to set percent_usage greater than 100% here because of incorrect memory reporting of PWK?

schneems commented 9 years ago

That's one angle. Maybe shoot for 120%. Don't try to get too close. If you go over, then PWK won't kill workers when you need it to. Also realize that PWK is a bandaid for larger memory problems, it doesn't solve them, just covers them up.

samnang commented 9 years ago

Pretty unstable. Right now heroku memory go exceeded memory over 1 GB, but PWK reports only about ~600MB. And it didn't kill the workers as well.

PumaWorkerKiller.config do |config|
  config.ram           = 512  # mb
  config.frequency     = 5    # seconds
  config.percent_usage = 1.20
end

PumaWorkerKiller.start
» 10:18:59.685  2015-02-05 03:18:59.625773+00:00 heroku web.1  - - Process running mem=1767M(345.2%)
» 10:18:59.761  2015-02-05 03:18:59.625820+00:00 heroku web.1  - - Error R14 (Memory quota exceeded) Critical
» 10:18:59.869  2015-02-05 03:18:59.625257+00:00 heroku web.1  - - source=web.1 dyno=heroku.21274089.e2b6196c-6736-47c5-bcfd-8cd6393289ae sample#load_avg_1m=0.00 sample#load_avg_5m=0.02 sample#load_avg_15m=0.04
» 10:18:59.945  2015-02-05 03:18:59.625357+00:00 heroku web.1  - - source=web.1 dyno=heroku.21274089.e2b6196c-6736-47c5-bcfd-8cd6393289ae sample#memory_total=1767.64MB sample#memory_rss=501.53MB sample#memory_cache=0.00MB sample#memory_swap=1266.11MB sample#memory_pgpgin=1217595pages sample#memory_pgpgout=1089204pages
» 10:19:02.802  2015-02-05 03:19:02.516087+00:00 app web.1     - - [3] PumaWorkerKiller: Consuming 594.34765625 mb with master and 2 workers
schneems commented 9 years ago

Make sure you're using version 0.0.3 or master.

Consuming 594.34765625 mb with master and 2 workers

This should have triggerd a kill cycle.

if (total = get_total_memory) > @max_ram
  @cluster.master.log "PumaWorkerKiller: Out of memory. #{@cluster.workers.count} workers consuming total: #{total} mb out of max: #{@max_ram} mb. Sending TERM to #{@cluster.largest_worker.inspect} consuming #{@cluster.largest_worker_memory} mb."
  @cluster.term_largest_worker
else
  @cluster.master.log "PumaWorkerKiller: Consuming #{total} mb with master and #{@cluster.workers.count} workers"
end

where @max_ram = ram * percent_usage which should be 614 mb.

samnang commented 9 years ago

Yep, I was using 0.0.3. You are right 120% is about 614 mb, but it seems heroku memory goes 1767M already, but PWK still reports 594.34765625 mb, that's why it didn't kill the workers yet.

schneems commented 9 years ago

check your version of get_process_mem should be 0.2.0

samnang commented 9 years ago

Yes, it is.

$ bundle show puma
.gems/gems/puma-2.11.0

$ bundle show puma_worker_killer
.gems/gems/puma_worker_killer-0.0.3

$ bundle show get_process_mem
.gems/gems/get_process_mem-0.2.0
schneems commented 9 years ago

Weird. This is how we get the memory usage

def get_total(workers = set_workers)
  master_memory = GetProcessMem.new(Process.pid).mb
  worker_memory = workers.map {|_, mem| mem }.inject(&:+) || 0
  worker_memory + master_memory
end

My best bet is that you have something else running, maybe a separate binary or program, that is using up memory in a different process and PWK can't see it. If you're shelling out a bunch using backticks or Process.spawn PWK won't see it. Again, this is just yet another reason why it's "use at your own risk"-ware for now. Thanks for giving it a shot. Unfortunately the introspection tools on containers are just so limited.

samnang commented 9 years ago

In my code, I don't use anything to start any sub processes, but I'm not sure about other third parties that I'm using. Some of them are pubnub, newrelic, sidekiq. As far I can tell from the response time, I see puma is faster. I haven't done any benchmark myself yet.

Another thing, when I was using unicorn I see the memory doesn't keep growing like that much. I'm not sure because of this https://github.com/puma/puma/issues/342

I think heroku recommends puma is the good choice and definitely the direction to go. Thanks, and hope you guys will find a way for this memory soon :smile:

chetan-wwindia commented 8 years ago

What ever the value i set for config.ram the gem is taking it as 512 and restricting it to use 335 mb ram only. I check the value in rails console PumaWorkerKiller.ram => 4096 still the cut out is working on 512. I hvae done the default configuration which is working just that not taking the new config in the config/puma.rb or config/initializers/puma_worker_killer.rb

schneems commented 8 years ago

@chetan-wwindia are you on Heroku? Make sure that ram is set before the worker killer is "started". If this reproduces locally can you give me an example app that shows the problem?

chetan-wwindia commented 8 years ago

@schneems I using this on aws ubuntu server

schneems commented 8 years ago

Can you give me the code you're using to set the values? Does it work locally?

chetan-wwindia commented 8 years ago

On server its nginx + puma with 4 puma workers
It keeps piling on ram up to 6 gb. I m also dealing with memory lead this is my temp solution till I find solution to memory leak .

Doesn't work on local PumaWorkerKiller.config do |config| config.ram = 4096 # mb config.frequency = 10 # seconds config.percent_usage = 0.80 config.rolling_restart_frequency = 3 * 3600 # 12 hours in seconds end PumaWorkerKiller.start

kevinelliott commented 7 years ago

Any update to report here? I'm curious of PWK can report correct memory consumption on Heroku dynos yet, or if there is some mild success using the memory definitions?

schneems commented 7 years ago

Check the readme. Will not work on Heroku until LXC exposes memory use inside of the container. So likely never. Use rolling restarts or performance dynos.

kevinelliott commented 7 years ago

Thanks, yeah I went with rolling restarts. So then it sounds like this issue should be re-closed.

jrimmer-healthiq commented 5 years ago

This isn't a problem on Performance Dynos, then? How about Shield Dynos?

schneems commented 5 years ago

perf, private, and shield dynos are all run on their own VPC so numbers should be correct.