Open NickAnge opened 5 months ago
Pinging code owners:
exporter/loadbalancing: @jpkrohling
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Thank you for the detailed report, I'll take a look and try to reproduce it. In the meantime, can you try switching to the DNS resolver instead of the k8s resolver? I'm not 100% sure yet it would show a difference, but the DNS resolver is known to consume fewer resources in other situations.
resolver:
k8s:
service: service
Thanks @jpkrohling . We have discussed internally the replacement of the K8s resolver with dns resolver. The conclusion was to stay with K8s resolver as it is faster into computing/resolve the endpoints of the backing collectors in case of rollout or outage.
Let me know if you need me to provide some more information about the issue, and thanks a lot for taking a look
Can you temporarily replace it, and see if the memory profile is different? If we can isolate this behavior to this resolver specifically, it's easier to find a solution.
This memory issue happened to our production environments only (probably because of higher traffic), so I am not sure if we can change it there even if it is temporarily :/. Did you manage to reproduce at your setup ?
I wasn't able to try it out. I might be able to find some time later this week, but next week I'm AFK again. If anyone is interested in this issue, it would help me a lot if I can have a confirmation that this is isolated to the k8s resolver.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
just pinging here the owner of exporter/loadbalancing: @jpkrohling to avoid having this issue stale
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Component(s)
exporter/loadbalancing
What happened?
Description
Hello team.
We recently upgraded our internal collectors from version 0.94.0 to 0.99.0, and we observed a rise in memory usage at the load balancer deployment collectors, as depicted in the image below. This persisted even after updating to the latest version, 0.101.0.
We enabled profiling to our collectors (pprof ) component observed inuse_memory and inuse_objects. I seperated by investigation between 3 pods with low, medium and high memory usage.
Inuse Memory - Top
Low Memory Usage Pod
Medium Memory Usage Pod
High Memory Usage Pod
Inuse_objects - top
Low Memory Usage Pod
Medium Memory Usage Pod
High Memory Usage Pod
Steps to Reproduce
Expected Result
Expected result was the memory to remain the same over time, after the bump of the version
Actual Result
High memory usage after bumping the version
Collector version
0.101.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response