If the Redis DB restarts on demand or after a failure, Kafka-Monitor is unable to clean up its stats which reside into Redis and those stats continue to increase in size and subsequently Redis uses more and more RAM.
This happens because Kafka-Monitor collects stats using the RollingTimeWindow class ( you can find it in scutils/stats_collector.py). RollingTimeWindow extends ThreadedCounter. ThreadedCounter uses a thread that runs the function __mainloop.
Inside __mainloop the expiration and cleanup of the stats takes place. To do this the thread receives the redis connection when it's spawned and has no control over it while it is running.
If now for some reason Redis DB restarts that thread has no connection, fails and if Kafka-Monitor is at a stage that doesn't need a redis connection and in the meantime Redis DB manages to restart, Kafka-Monitor can continue running and dumping stats to redis without knowing that the thread is down.
Fairly certain we just need to add a try/catch around the block here. Staying within the while loop, but ensuring the thread doesn't die if a redis exception occurs.
If the Redis DB restarts on demand or after a failure, Kafka-Monitor is unable to clean up its stats which reside into Redis and those stats continue to increase in size and subsequently Redis uses more and more RAM.
This happens because Kafka-Monitor collects stats using the RollingTimeWindow class ( you can find it in scutils/stats_collector.py). RollingTimeWindow extends ThreadedCounter. ThreadedCounter uses a thread that runs the function __mainloop.
Inside __mainloop the expiration and cleanup of the stats takes place. To do this the thread receives the redis connection when it's spawned and has no control over it while it is running.
If now for some reason Redis DB restarts that thread has no connection, fails and if Kafka-Monitor is at a stage that doesn't need a redis connection and in the meantime Redis DB manages to restart, Kafka-Monitor can continue running and dumping stats to redis without knowing that the thread is down.