Closed chadbailey59 closed 4 years ago
PWK uses Puma's term
method directly https://github.com/puma/puma/blob/6479e6b26b2b4ac7d09a78cd7b1e04470d8a213e/lib/puma/cluster.rb#L76-L87
Maybe they can try increasing the worker_shutdown_timeout
value. That or putting a timeout on individual connections. Not sure if you've seen it, but I made a devcenter article about what happens when a process/worker gets TERMed https://devcenter.heroku.com/articles/what-happens-to-ruby-apps-when-they-are-restarted. By us sending SIGTERM to the worker that is doing processing, it should clean itself up. Either something is hanging in an ensure
block somewhere, or it cannot clean itself up fast enough.
Ref. Heroku support ticket: https://support.heroku.com/tickets/238155
This app was running with a
config.frequency = 60
, and each of their backends were using about 200 MB of RAM on the postgres server, leading to a graph that looks like this:The marker is when they installed pgbouncer, which effectively insulates them from this problem by limiting the maximum number of connections coming from any one dyno.
I can't prove it without access to their logs, but I suspect their workers are getting TERMed, then aggressively KILLed—and the KILL leaks the database connection by not explicitly closing it. (I'm working on a POC for this soon.)
If I'm right, this problem also exists with
unicorn_worker_killer
.