zombocom / puma_worker_killer

Automatically restart Puma cluster workers based on max RAM available
748 stars 77 forks source link

Possible issue with leaving open database connections #13

Closed chadbailey59 closed 4 years ago

chadbailey59 commented 9 years ago

Ref. Heroku support ticket: https://support.heroku.com/tickets/238155

This app was running with a config.frequency = 60, and each of their backends were using about 200 MB of RAM on the postgres server, leading to a graph that looks like this:

image

The marker is when they installed pgbouncer, which effectively insulates them from this problem by limiting the maximum number of connections coming from any one dyno.

I can't prove it without access to their logs, but I suspect their workers are getting TERMed, then aggressively KILLed—and the KILL leaks the database connection by not explicitly closing it. (I'm working on a POC for this soon.)

If I'm right, this problem also exists with unicorn_worker_killer.

schneems commented 9 years ago

PWK uses Puma's term method directly https://github.com/puma/puma/blob/6479e6b26b2b4ac7d09a78cd7b1e04470d8a213e/lib/puma/cluster.rb#L76-L87

Maybe they can try increasing the worker_shutdown_timeout value. That or putting a timeout on individual connections. Not sure if you've seen it, but I made a devcenter article about what happens when a process/worker gets TERMed https://devcenter.heroku.com/articles/what-happens-to-ruby-apps-when-they-are-restarted. By us sending SIGTERM to the worker that is doing processing, it should clean itself up. Either something is hanging in an ensure block somewhere, or it cannot clean itself up fast enough.