Open ervikrant06 opened 2 years ago
elasticsearch_clusterinfo_up is with result 1 and elasticsearch_cluster_health_up with result 0. No Json parse failure.
# HELP elasticsearch_cluster_health_json_parse_failures Number of errors while parsing JSON.
# TYPE elasticsearch_cluster_health_json_parse_failures counter
elasticsearch_cluster_health_json_parse_failures 0
# HELP elasticsearch_cluster_health_total_scrapes Current total ElasticSearch cluster health scrapes.
# TYPE elasticsearch_cluster_health_total_scrapes counter
elasticsearch_cluster_health_total_scrapes 44917
# HELP elasticsearch_cluster_health_up Was the last scrape of the ElasticSearch cluster health endpoint successful.
# TYPE elasticsearch_cluster_health_up gauge
elasticsearch_cluster_health_up 0
# HELP elasticsearch_clusterinfo_last_retrieval_failure_ts Timestamp of the last failed cluster info retrieval
# TYPE elasticsearch_clusterinfo_last_retrieval_failure_ts gauge
elasticsearch_clusterinfo_last_retrieval_failure_ts{url="https://localhost:9200"} 1.659449338e+09
# HELP elasticsearch_clusterinfo_last_retrieval_success_ts Timestamp of the last successful cluster info retrieval
# TYPE elasticsearch_clusterinfo_last_retrieval_success_ts gauge
elasticsearch_clusterinfo_last_retrieval_success_ts{url="https://localhost:9200"} 1.660122238e+09
# HELP elasticsearch_clusterinfo_up Up metric for the cluster info collector
# TYPE elasticsearch_clusterinfo_up gauge
elasticsearch_clusterinfo_up{url="https://localhost:9200"} 1
We have recently started using Opensearch 2.1.0 version in our environment as a replacement of Opendistro. AFAIU elasticsearch_exporter is not ES version specific hence it should work without any issue with opensearch.
Broady, facing two issues.
1) Prometheus intermittenly couldn't scrape the metrics from ES nodes. We are running exporter on each ES node. Faced this issue 1.3 version updated to 1.5 version but not of much help.
elasticsearch_cluster_health_up reported as 0 on few of the nodes in cluster (sometime one node) in cluster while checking https://NODE_URL:9108/metrics . At same time other reports the state as 1 .. checking the ES cluster and node health from ES API returns everything in healthy state. Could it be a operator issue.
2) Exporter keep on giving these messages with opensearch with opendistro it was never an issue. 500 indicates the server error but ES API itself works fine. 403 is permissioning after seeing this error prometheus user mapped to monitoring role but still error keep on coming.
These two issues seems to be inter-related but couldn't find why sometime it start failing to decode cluster health.