Open swar8080 opened 5 months ago
Thanks for the details here, I'm going to take some profiles in my clusters where I run the operator and see if anything is jumping out performance wise.
Maybe we need to improve the caching of configmaps and secrets https://sdk.operatorframework.io/docs/best-practices/designing-lean-operators/
@pavolloffay i think you're right!
config from string is certainly not helping though, I think we should look into getting #2735 which would help a lot here
We hit on the same issue on our EKS with ~86 nodes when the manager
container within the opentelemetry-operator
is enormously memory-hungry during its start, leading to OOMKills.
We are on the v0.107.0
version.
Component(s)
Operator
What happened?
Description
Based on the discussion in this slack thread: https://cloud-native.slack.com/archives/C033BJ8BASU/p1712158076121409
We are seeing the OTEL operator consume up to 1.4gb of memory during start-up before settling to ~600mb. This is a cluster with ~7k pods, about a dozen OTEL collectors, and so far only a few pods using
Instrumentation
for auto-injection. This happened recently after bumping the java instrumentation version to inject from 2.0.0 to 2.2.0. No issues in our other (smaller) clusters.We raised the k8 memory limit to 3gb for now
Configuration
Kubernetes Version
1.24
Operator version
0.90.0
Collector version
A minor fork of 0.94.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")
Log output
Additional context
No response