python-diamond / Diamond

Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.
http://diamond.readthedocs.org/
MIT License
1.74k stars 601 forks source link

Elasticsearch collector multi instance problem #631

Open AlexAkulov opened 7 years ago

AlexAkulov commented 7 years ago

Hello! I have three instances of Elasticsearch on one host.

enabled = true
instances = instance1@localhost:9202,instance2@localhost:9204,instance3@localhost:9206
path_prefix = EDI.elasticsearch
logstash_mode = True
stats = jvm,thread_pool
cluster = True

When one of instance ES doesn't response by timeout then diamond doesn't send stats from all other instances too. It confused! Now when one of process has a problem it looks like as whole cluster are broken.

shortdudey123 commented 7 years ago

Do you have any logs that show where in the collector the timeout happened? I am guessing that there is an uncaught exception that causes the entire collector to bomb out if a timeout happens