open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.97k stars 2.3k forks source link

Memory leak in v0.109 when running collector in deployment mode #35344

Open tomeroszlak opened 1 week ago

tomeroszlak commented 1 week ago

Component(s)

cmd/otelcontribcol

What happened?

Description We recently upgraded our OpenTelemetry collector from v0.94.0 to v0.109.0 and are running it as a deployment behind an NGINX ingress. We’ve observed that memory usage spikes to 80% of the pod's available memory within five minutes and does not decrease.

Upon reviewing the metrics for failed log records (otelcol_exporter_send_failed_log_records{}), we noticed two exporters—{exporter="otlphttp"} and {exporter="otlp"}—which are not defined in our configuration but are continuously dropping logs.

Additionally, it appears that the memory_limiter is not updating GOMEMLIMIT to 100%; instead, it remains fixed at 80%.

Steps to Reproduce Expected Result The garbage collector should free up memory, preventing the pod from being stuck at 80%.

Actual Result The pod reaches 80% of the available memory and remains at that level.

Collector version

v0.109.0

Environment information

Environment

running on K8S v1.28 as a deployment.

OpenTelemetry Collector configuration

Log output

2024/09/22 22:30:19 http: superfluous response.WriteHeader call from go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/internal/request.(*RespWriterWrapper).writeHeader (resp_writer_wrapper.go:78)

Additional context

No response

ChrsMark commented 1 week ago

It would be super helpful if you could enable the pprofextension and gather some heap dumps. This would help spot what the potential "leaking" components are. Also please provide the full Collector's configuration.

atoulme commented 4 days ago

Please upgrade to the latest release to remove the superfluous logs. Please provide the complete configuration of the collector, masking the passwords and confidential information, and follow @ChrsMark 's lead on using the pprofextension to collect more data.