Closed mrchrisadams closed 5 years ago
This turned out to be okay to setup.
We now have a listener, so if every 60 seconds, if memory consumption in a enqgreencheck workers is exceeding 500mb, we kill it (that's the TICK_60) bit.
[eventlistener:memmon]
command=memmon -p enqgreencheck=500MB -m support-address@streams.zulipchat.com
events=TICK_60
In more detail:
The name of the stanza as per supervisord
[eventlistener:memmon]
This is the name of the command to call
command=memmon
The memory threshold we check for:
-p enqgreencheck=500MB
The address we email a notification to when it happens, in our case our zulip chat room:
-m support-address@streams.zulipchat.com
Do this check every 60 seconds:
events=TICK_60
We had an incident today where runaway memory usage with workers consuming from the queue with RabbitMQ would eat so much memory in production that it would free the whole box.
We have a few options to catch runaway memory usage to avoid this, but given that we're using supervisord to maintain a pool of workers it's worth looking at superlance, an extension to supervisord that tracks memory usage, to automatically catch process that are using too much memory.
You can see some more guidance here on setting it up an installing, but generally speaking, the approach is:
install with
pip install superlance
add a stanza like the one below to the supervisor config file at
/etc/supervisor/conf.d/enqueue_greencheck.conf
We probably need to do this for a group rather than a single process, as we have pool of workers that we care about.
More here:
https://thepracticalsysadmin.com/quicktip-manage-memory-usage-with-supervisord/
https://github.com/corvus-ch/rabbitmq-cli-consumer