Open Tarmander opened 1 month ago
I have noticed a very similar issue using the loadbalancing exporter in grafana alloy which uses this component. Here are some pyroscope screenshots:
_, err = builder.meter.RegisterCallback(func(_ context.Context, o metric.Observer) error {
o.ObserveInt64(builder.ExporterQueueCapacity, cb(), opts...)
return nil
}, builder.ExporterQueueCapacity)
The return values of the RegisterCallback functions are ignored. They hold references that can be used to unregister these callbacks when the exporter is being shut down. I assume they should be returned to the caller exporter/exporterhelper/internal/queue_sender Start()
, to be later used in Shutdown()
Describe the bug This is related to an issue with the exporter/loadbalancingexporter. The k8s resolver would continuously
Shutdown()
and create 2 newboundedMemoryQueue
's every time the endpoints were "updated" (roughly every 3 minutes). This behavior went unnoticed until the Memory Limiter Processor started to drop spans.After investigation with the pprof extension, we realized that we had an unbounded memory leak, and the root cause was that each time an exporter and subsequent queue were shutdown, the underlying channel was not GC'd properly. We'd continue allocating a new channel each update until we ran OOM.
Steps to reproduce Configure the k8s resolver to point to a service with many endpoints (the more endpoints the quicker the memory increase). You can run with the pprof extension to see the memory increase in
newSizedChannel
over time.What did you expect to see? All exporters/queues/channels to be properly
Shutdown()
and GC'd.What did you see instead? Channels in existing exporter queues were not disposed of, and eventually they used up all resources in the pod.
What version did you use? v0.105.0
What config did you use?
Environment OS: Ubuntu 22.04 Compiler: go1.22.6 kubenertes version: apiVersion: opentelemetry.io/v1beta1