Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.
When using diamond to send to opentsdb we have found the diamond process to occasionally stop sending any metrics to opentsdb.
This does match up with the times when one or more of the opentsdb nodes attached to load balancer (AWS ELB) goes out of the cluster leaving the diamond socket in CLOSE_WAIT state.
I have tried a local patch which adds TCP keepalive and reconnection interval to get around the issue. Happy to submit it as a patch here.
When using diamond to send to opentsdb we have found the diamond process to occasionally stop sending any metrics to opentsdb. This does match up with the times when one or more of the opentsdb nodes attached to load balancer (AWS ELB) goes out of the cluster leaving the diamond socket in CLOSE_WAIT state. I have tried a local patch which adds TCP keepalive and reconnection interval to get around the issue. Happy to submit it as a patch here.