prometheus-community / elasticsearch_exporter

Elasticsearch stats exporter for Prometheus
Apache License 2.0
1.91k stars 792 forks source link

Elasticsearch metrics not pulling the right cluster name #285

Closed verma-preet closed 5 years ago

verma-preet commented 5 years ago

We are using the docker image image: justwatch/elasticsearch_exporter:1.0.2 and have several alerts setup for ElasticSearch. The metrics are getting successfully scraped and hence showing up on the UI but when the alert fires, it is not reaching slack, pagerduty or any other integrations. The other alerts from the same cluster are working though. I noticed that one of the metric elasticsearch_cluster_health_status that we have an alert on has the cluster name wrong in the labels. It is showing up with cluster="abc05" whereas the actual cluster name is "abc05-prod". For example:

elasticsearch_cluster_health_status{cluster="abc05",color="green",endpoint="es-metrics",instance="10.233.107.69:9108",job="elasticsearch-exporter",namespace="monitoring",pod="elasticsearch-exporter-9f896cf85-j8wbc",service="elasticsearch-exporter"} 

I suspect that the wrong cluster name could be preventing the alert from routing through the AlertManager integrations. We are using Prometheus Operator with Alertmanager version 0.16.0.

zwopir commented 5 years ago

Hi @verma-preet,

can you please send me the output of curl <your_es_url>:9200/_cluster/health | jq .cluster_name and

curl <your_es_url>:9200/ | jq .cluster_name

to help me to debug the problem?!

verma-preet commented 5 years ago

@zwopir Thanks for the quick response. The actual cluster name is syd05-prod but it shows up in the exporter as syd05 which is causing the AM to not correctly route the alerts

$ curl -XGET -u elastic:xyz <es-url>:9200/_cluster/health | jq .cluster_name
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   385  100   385    0     0   1491      0 --:--:-- --:--:-- --:--:--  1492
"syd05"
$
$ curl -XGET -u elastic:xyz <es-url>:9200/ | jq .cluster_name
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   493  100   493    0     0   1921      0 --:--:-- --:--:-- --:--:--  1925
"syd05"
zwopir commented 5 years ago

there must be a mismatch about what you interpret as cluster name and what the ES API endpoint and thus the elasticsearch_exporter sees as the cluster name. The above API calls are exactly the calls the exporter does. The jq pipe just filters out the json field.

Where is the value "syd05-prod" coming from? Where do you see it?

verma-preet commented 5 years ago

@zwopir The cluster name is coming from the inventory and I see that it's not set right in ES and hence the exporter is working as expected. Thanks for taking time to look into it.