open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.18k stars 417 forks source link

High operator memory usage during start-up #2808

Open swar8080 opened 5 months ago

swar8080 commented 5 months ago

Component(s)

Operator

What happened?

Description

Based on the discussion in this slack thread: https://cloud-native.slack.com/archives/C033BJ8BASU/p1712158076121409

We are seeing the OTEL operator consume up to 1.4gb of memory during start-up before settling to ~600mb. This is a cluster with ~7k pods, about a dozen OTEL collectors, and so far only a few pods using Instrumentation for auto-injection. This happened recently after bumping the java instrumentation version to inject from 2.0.0 to 2.2.0. No issues in our other (smaller) clusters.

image

We raised the k8 memory limit to 3gb for now

Configuration

        - repoURL: "https://open-telemetry.github.io/opentelemetry-helm-charts"
          targetRevision: "0.44.2"
          chart: opentelemetry-operator
          helm:
            valuesObject:
              admissionWebhooks:
                certManager:
                  create: true
              manager:
                podLabels:
                  grafana.agent.log.collection: monit
                collectorImage:
                  repository: our.custom.artifactory:6555/otelcontrib-our-fork
                  tag: // Omitted 
                autoInstrumentationImage:
                  java:
                    repository: "ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java"
                    tag: "2.2.0"

Kubernetes Version

1.24

Operator version

0.90.0

Collector version

A minor fork of 0.94.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

Log output

2024-04-03 17:37:12.648 
I0403 17:37:12.648170       1 leaderelection.go:260] successfully acquired lease observability/9f7554c3.opentelemetry.io
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","logger":"collector-upgrade","msg":"looking for managed instances to upgrade"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1alpha1.OpenTelemetryCollector"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","logger":"instrumentation-upgrade","msg":"looking for managed Instrumentation instances to upgrade"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodDisruptionBudget"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1alpha1.OpAMPBridge"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.ConfigMap"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.ServiceAccount"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.Service"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.Deployment"}
2024-04-03 17:37:12.648 
{"level":"info","ts":"2024-04-03T17:37:12Z","msg":"Starting Controller","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge"}
2024-04-03 17:37:12.754 
{"level":"info","ts":"2024-04-03T17:37:12Z","logger":"collector-upgrade","msg":"no instances to upgrade"}
2024-04-03 17:37:12.760 
{"level":"error","ts":"2024-04-03T17:37:12Z","logger":"instrumentation-upgrade","msg":"autoinstrumentation not enabled for this language","flag":"operator.autoinstrumentation.go","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/instrumentation/upgrade.(*InstrumentationUpgrade).upgrade\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/instrumentation/upgrade/upgrade.go:135\ngithub.com/open-telemetry/opentelemetry-operator/pkg/instrumentation/upgrade.(*InstrumentationUpgrade).ManagedInstances\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/instrumentation/upgrade/upgrade.go:75\nmain.addDependencies.func2\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/main.go:345\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/manager/manager.go:301\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/manager/runnable_group.go:223"}
2024-04-03 17:37:12.760 
{"level":"error","ts":"2024-04-03T17:37:12Z","logger":"instrumentation-upgrade","msg":"autoinstrumentation not enabled for this language","flag":"operator.autoinstrumentation.nginx","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/instrumentation/upgrade.(*InstrumentationUpgrade).upgrade\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/instrumentation/upgrade/upgrade.go:135\ngithub.com/open-telemetry/opentelemetry-operator/pkg/instrumentation/upgrade.(*InstrumentationUpgrade).ManagedInstances\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/instrumentation/upgrade/upgrade.go:75\nmain.addDependencies.func2\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/main.go:345\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/manager/manager.go:301\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/manager/runnable_group.go:223"}
2024-04-03 17:37:20.263 
{"level":"info","ts":"2024-04-03T17:37:20Z","msg":"Starting workers","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","worker count":1}
2024-04-03 17:37:20.358 
{"level":"info","ts":"2024-04-03T17:37:20Z","msg":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}
2024-04-03 17:37:20.428 
{"level":"info","ts":"2024-04-03T17:37:20Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:20.428 
{"level":"info","ts":"2024-04-03T17:37:20Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:37:20.489 
{"level":"info","ts":"2024-04-03T17:37:20Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:20.489 
{"level":"info","ts":"2024-04-03T17:37:20Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:37:20.551 
{"level":"info","ts":"2024-04-03T17:37:20Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:20.551 
{"level":"info","ts":"2024-04-03T17:37:20Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:37:20.621 
{"level":"info","ts":"2024-04-03T17:37:20Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:20.621 
{"level":"info","ts":"2024-04-03T17:37:20Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:37:21.403 
{"level":"info","ts":"2024-04-03T17:37:21Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:21.403 
{"level":"info","ts":"2024-04-03T17:37:21Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:37:22.122 
{"level":"info","ts":"2024-04-03T17:37:22Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:22.122 
{"level":"info","ts":"2024-04-03T17:37:22Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:37:24.194 
{"level":"info","ts":"2024-04-03T17:37:24Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:24.194 
{"level":"info","ts":"2024-04-03T17:37:24Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:37:36.432 
{"level":"info","ts":"2024-04-03T17:37:36Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:36.432 
{"level":"info","ts":"2024-04-03T17:37:36Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:37:37.135 
{"level":"info","ts":"2024-04-03T17:37:37Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:37.135 
{"level":"info","ts":"2024-04-03T17:37:37Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:37:39.221 
{"level":"info","ts":"2024-04-03T17:37:39Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:39.222 
{"level":"info","ts":"2024-04-03T17:37:39Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:37:51.453 
{"level":"info","ts":"2024-04-03T17:37:51Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:51.453 
{"level":"info","ts":"2024-04-03T17:37:51Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:37:52.147 
{"level":"info","ts":"2024-04-03T17:37:52Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:52.147 
{"level":"info","ts":"2024-04-03T17:37:52Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:37:54.288 
{"level":"info","ts":"2024-04-03T17:37:54Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:37:54.288 
{"level":"info","ts":"2024-04-03T17:37:54Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:38:06.467 
{"level":"info","ts":"2024-04-03T17:38:06Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:06.467 
{"level":"info","ts":"2024-04-03T17:38:06Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:38:07.167 
{"level":"info","ts":"2024-04-03T17:38:07Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:07.167 
{"level":"info","ts":"2024-04-03T17:38:07Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:38:09.313 
{"level":"info","ts":"2024-04-03T17:38:09Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:09.313 
{"level":"info","ts":"2024-04-03T17:38:09Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:38:21.494 
{"level":"info","ts":"2024-04-03T17:38:21Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:21.494 
{"level":"info","ts":"2024-04-03T17:38:21Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:38:22.184 
{"level":"info","ts":"2024-04-03T17:38:22Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:22.184 
{"level":"info","ts":"2024-04-03T17:38:22Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:38:24.323 
{"level":"info","ts":"2024-04-03T17:38:24Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:24.323 
{"level":"info","ts":"2024-04-03T17:38:24Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:38:36.519 
{"level":"info","ts":"2024-04-03T17:38:36Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:36.519 
{"level":"info","ts":"2024-04-03T17:38:36Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:38:37.198 
{"level":"info","ts":"2024-04-03T17:38:37Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:37.198 
{"level":"info","ts":"2024-04-03T17:38:37Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:38:39.338 
{"level":"info","ts":"2024-04-03T17:38:39Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:39.338 
{"level":"info","ts":"2024-04-03T17:38:39Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:38:51.543 
{"level":"info","ts":"2024-04-03T17:38:51Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:51.543 
{"level":"info","ts":"2024-04-03T17:38:51Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:38:52.217 
{"level":"info","ts":"2024-04-03T17:38:52Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:52.218 
{"level":"info","ts":"2024-04-03T17:38:52Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:38:54.358 
{"level":"info","ts":"2024-04-03T17:38:54Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:38:54.358 
{"level":"info","ts":"2024-04-03T17:38:54Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:39:06.568 
{"level":"info","ts":"2024-04-03T17:39:06Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:06.568 
{"level":"info","ts":"2024-04-03T17:39:06Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:39:07.230 
{"level":"info","ts":"2024-04-03T17:39:07Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:07.230 
{"level":"info","ts":"2024-04-03T17:39:07Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:39:09.365 
{"level":"info","ts":"2024-04-03T17:39:09Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:09.365 
{"level":"info","ts":"2024-04-03T17:39:09Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:39:21.584 
{"level":"info","ts":"2024-04-03T17:39:21Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:21.584 
{"level":"info","ts":"2024-04-03T17:39:21Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:39:22.247 
{"level":"info","ts":"2024-04-03T17:39:22Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:22.247 
{"level":"info","ts":"2024-04-03T17:39:22Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:39:24.379 
{"level":"info","ts":"2024-04-03T17:39:24Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:24.379 
{"level":"info","ts":"2024-04-03T17:39:24Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:39:36.616 
{"level":"info","ts":"2024-04-03T17:39:36Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:36.616 
{"level":"info","ts":"2024-04-03T17:39:36Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:39:37.262 
{"level":"info","ts":"2024-04-03T17:39:37Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:37.262 
{"level":"info","ts":"2024-04-03T17:39:37Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:39:39.414 
{"level":"info","ts":"2024-04-03T17:39:39Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:39.414 
{"level":"info","ts":"2024-04-03T17:39:39Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:39:51.628 
{"level":"info","ts":"2024-04-03T17:39:51Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:51.628 
{"level":"info","ts":"2024-04-03T17:39:51Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}
2024-04-03 17:39:52.280 
{"level":"info","ts":"2024-04-03T17:39:52Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-backends","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:52.280 
{"level":"info","ts":"2024-04-03T17:39:52Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-backends","namespace":"observability"}
2024-04-03 17:39:54.410 
{"level":"info","ts":"2024-04-03T17:39:54Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:39:54.410 
{"level":"info","ts":"2024-04-03T17:39:54Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-loadbalancer","namespace":"observability"}
2024-04-03 17:40:06.646 
{"level":"info","ts":"2024-04-03T17:40:06Z","logger":"controllers.OpenTelemetryCollector","msg":"no upgrade routines are needed for the OpenTelemetry instance","name":"monit-otel-collectors-forwarder","namespace":"observability","version":"0.90.1","latest":"0.61.0"}
2024-04-03 17:40:06.646 
{"level":"info","ts":"2024-04-03T17:40:06Z","logger":"controllers.OpenTelemetryCollector","msg":"skipping upgrade for OpenTelemetry Collector instance","name":"monit-otel-collectors-forwarder","namespace":"observability"}

Additional context

No response

jaronoff97 commented 5 months ago

Thanks for the details here, I'm going to take some profiles in my clusters where I run the operator and see if anything is jumping out performance wise.

pavolloffay commented 5 months ago

Maybe we need to improve the caching of configmaps and secrets https://sdk.operatorframework.io/docs/best-practices/designing-lean-operators/

jaronoff97 commented 5 months ago

Screenshot 2024-04-05 at 2 57 35 PM @pavolloffay i think you're right!

jaronoff97 commented 5 months ago

Screenshot 2024-04-05 at 2 59 50 PM config from string is certainly not helping though, I think we should look into getting #2735 which would help a lot here

Rohlik commented 5 days ago

We hit on the same issue on our EKS with ~86 nodes when the manager container within the opentelemetry-operator is enormously memory-hungry during its start, leading to OOMKills. We are on the v0.107.0 version.