The exporter is many times OOMKilled

prometheus-community / elasticsearch_exporter

Elasticsearch stats exporter for Prometheus

Apache License 2.0

1.91k stars 790 forks source link

The exporter is many times OOMKilled #834

Open LHozzan opened 8 months ago

LHozzan commented 8 months ago

Hello.

We discovered, that the exporter is many times OOMKilled. Here you can see many times memory peaks. Trial01

I have no clue, why sometimes the exporter need to use more RAM, but by default dont need it.

We have set this resources:

resources:
  requests:
    cpu: 500m
    memory: 40Mi
  limits:
    cpu: 1500m
    memory: 60Mi

Are there any memory leaks?

sysadmind commented 8 months ago

What version of the exporter are you using? What flags are you using? What version of elasticsearch? What is the configuration of your elasticsearch cluster (number of nodes and which roles)?

LHozzan commented 8 months ago

Thank you for feedback.

What version of the exporter are you using?

v1.6.0

What flags are you using?

Command:
  elasticsearch_exporter
  --log.format=logfmt
  --log.level=info
  --es.uri=https://REDACTED:REDACTED@ENDPOINT.NameSpace.svc.cluster.local:9200
  --es.all
  --es.indices
  --es.indices_settings
  --es.indices_mappings
  --es.shards
  --es.snapshots
  --es.timeout=20s
  --es.ssl-skip-verify
  --web.config.file=/etc/web-config.yaml
  --web.listen-address=:9108
  --web.telemetry-path=/metrics

/etc/web-config.yaml

tls_server_config:
  cert_file: /etc/ssl/certs/tls.crt
  key_file: /etc/ssl/certs/tls.key

What version of elasticsearch?

OpenSearch v2.11.0

What is the configuration of your elasticsearch cluster (number of nodes and which roles)?

In larger clusters, where the problem occurring, we have 6 nodes total (2x coordination, 2x manager, 2x data). We using the exporter in small clusters too with only one multirole node, but we not discovered the problem here.

Tristan971 commented 7 months ago

--es.indices_mappings --es.shards

Indice mappings and shards are both quite a lot of data for any nontrivial cluster (curling the metrics endpoint shows it quite clearly). Unless you are actually using that data, removing that fetch will both save you some stress on the ES cluster and a lot of memory in the exporter.

If you actually want to use these, you will have to significantly raise the memory allocation.

In our case, the exporter without these averages ~60MB RAM usage, and ~240MB with them.

LHozzan commented 7 months ago

I made some testing and you have right. Memory usage is higher especially on clusters with more indices.

Switch --es.indices_mappings unfortunately include valuable information for us. But the switch --es.shards is possible to omit. We lost some information, but they are isnt important, by my humble opinion.

It is possible do some optimization on the exporter side?