jjneely / statsrelay

A Golang consistent hashing proxy for statsd
MIT License
56 stars 19 forks source link

Make StatsRelay detect if StatsD Daemons are Alive #2

Open jjneely opened 9 years ago

jjneely commented 9 years ago

The current code base does nothing to detect or react to StatsD daemons that are not alive. The UDP StatsD protocol is designed to be fire-and-forget and offers no way to detect if the other side has received the packet.

StatsD daemons have a TCP administrative interface that's probably very useful for checking if the process is alive. That may be of help with this issue.

Things to think about:

justdaver commented 9 years ago

Initially I was thinking about using something like mon to periodically check if the statsd backend's are up (port check against the statsd admin port?) and if mon detects that a statsd host is down then restart the statsrelay daemon(s) and leave out the host which is down - when it comes back online then restart the statsrelay daemon(s) and include the host again. That said, I really like your idea of including this kind of functionality into statsrelay.

Some ideas/thoughts/2c from my side:

Unfortunately I am not much of a programmer.. and my coding kung fu is very weak but will help out with as much as possible on the testing side of things!

denen99 commented 9 years ago

I would suggest creating a fixed size memory buffer that just gets overriden. I would also couple that with some sort of a timeout. So buffer X MB of metrics, for Y seconds. Y would be the TTL before you removed the node from the ring and just started sending metrics to another node (as noted above, the least bad situation). When the node comes back up, flush the buffer to the previously used node, add the now up node back to the ring.