thegreenwebfoundation / greencheck-api

The green web foundation API
Apache License 2.0
9 stars 3 forks source link

Find way to support runaway memory usage with workers #23

Closed mrchrisadams closed 5 years ago

mrchrisadams commented 5 years ago

We had an incident today where runaway memory usage with workers consuming from the queue with RabbitMQ would eat so much memory in production that it would free the whole box.

We have a few options to catch runaway memory usage to avoid this, but given that we're using supervisord to maintain a pool of workers it's worth looking at superlance, an extension to supervisord that tracks memory usage, to automatically catch process that are using too much memory.

You can see some more guidance here on setting it up an installing, but generally speaking, the approach is:

  1. install with pip install superlance

  2. add a stanza like the one below to the supervisor config file at /etc/supervisor/conf.d/enqueue_greencheck.conf

command=memmon -p <program_name>=3GB

We probably need to do this for a group rather than a single process, as we have pool of workers that we care about.

More here:

mrchrisadams commented 5 years ago

This turned out to be okay to setup.

We now have a listener, so if every 60 seconds, if memory consumption in a enqgreencheck workers is exceeding 500mb, we kill it (that's the TICK_60) bit.

command=memmon -p enqgreencheck=500MB -m

In more detail:

The name of the stanza as per supervisord


This is the name of the command to call


The memory threshold we check for:

  -p enqgreencheck=500MB

The address we email a notification to when it happens, in our case our zulip chat room:


Do this check every 60 seconds:
