yeti-platform / yeti

Your Everyday Threat Intelligence
https://yeti-platform.io/
Apache License 2.0
1.76k stars 291 forks source link

trim queue based on its memory usage rather than its length #1164

Closed udgover closed 3 weeks ago

udgover commented 3 weeks ago

Overview

This PR improves previous implementation of queue watcher and reducer for redis stability when there's no events / logs consumers running. As mentioned in first implementation PR, when there's no consumers running, redis queue lists are growing infinitely and leads to OOM kill of the service.

Previous implementation relied on number of events in the queue before trimming. However, memory footprint of messages differ from message types. Instead of relying on queue length, this PR relies on memory usage of the queue. If the memory usage is greater than a configured threshold, the queue will be trimmed to keep the most recent keep_ratio messages.

Configuration

Memory limit

Memory limit is used as a threshold to trim the size of the message queue. It is configured with memory_limit key under [events] section. If not configured, it fallbacks to 64 MiB. If configured lower than 64MiB, it fallbacks to 64 MiB.

[!NOTE] This memory limit should be below the actual memory of your redis service since redis is also used by celery to run and schedule classical tasks. For example, if your redis service has 128MiB of memory, you could set memory_limit to 96.

Keep ratio

When memory limit is reached, message queue is trimmed to remove oldest message. To define how much messages must be kept in the queue, keep_ratio is used. It is defined as a float greater than 0 and lesser than 1. It is configured with keep_ratio key under [events] section. If not configured, it fallbacks to 0.9 meaning that 10% of the messages will be removed from the oldest messages in the queue. If configured keep_ratio is lte 0 or keep_ratio is gte 1, it fallbacks to 0.9.