Open jihuiyang opened 2 months ago
Hey @jihuiyang! I'm here to help you with any bugs, questions, or contributions. Let's tackle this together!
I found a similar open issue that might be relevant to your problem: [operator] Collector fails with featureGate errors when Upgrading the Operator to chart version 0.68.1. The issue you're encountering with the OpenTelemetry collector failing with featureGate errors is due to changes in the OpenTelemetryCollector
CRD. Specifically, the confmap.unifyEnvVarExpansion
feature gate is now stable and cannot be disabled.
To resolve this issue, you need to ensure that your CRDs are properly annotated and labeled before upgrading. Here are the steps:
Annotate and label the CRDs:
RELEASE_NAME=my-opentelemetry-operator
RELEASE_NAMESPACE=opentelemetry-operator-system
kubectl annotate crds instrumentations.opentelemetry.io opentelemetrycollectors.opentelemetry.io opampbridges.opentelemetry.io \
meta.helm.sh/release-name=${RELEASE_NAME} \
meta.helm.sh/release-namespace=${RELEASE_NAMESPACE}
kubectl label crds instrumentations.opentelemetry.io opentelemetrycollectors.opentelemetry.io opampbridges.opentelemetry.io app.kubernetes.io/managed-by=Helm
Follow the TLS certificate requirements and upgrade instructions provided in the OpenTelemetry Operator Helm Chart documentation.
Additionally, if you are upgrading from version 0.67 to 0.68, note that the preset.kubernetesEvents
preset now excludes DELETED
watch types. If you want to keep ingesting DELETED
updates for Kubernetes Events, you will need to configure the k8sobjectsreceiver
manually.
This issue is a known bug and will be fixed by this PR in the operator: https://github.com/open-telemetry/opentelemetry-operator/pull/3074. This fix will be included in the next operator helm release [1].
To continue talking to Dosu, mention @dosu.
bug still exists in latest operator version
@jihuiyang You cannot disable the confmap.unifyEnvVarExpansion feature gate anymore.
Doing --feature-gates= -confmap.unifyEnvVarExpansion is not allowed
I did not specify --feature-gates=-confmap.unifyEnvVarExpansion
, all i did was helm install 0.69.0 and somehow it had the this argument
maybe i should uninstall and whole thing and reinstall?
@jihuiyang I haven't been able to reproduce your issue yet with a clean install or with a helm upgrade
. Can you provide more details?
@jihuiyang the latest version of the operator should resolve this by removing the gate from a collector (we previously needed to add this to the operator's code to prevent users from config from breaking from a collector change)
Can you link any logs you are seeing from the operator?
I tried a clean install and it worked without the --feature-gates=-confmap.unifyEnvVarExpansion
flag. let me try just the upgrade
@jihuiyang thanks for trying that out. I have been debugging the upgrade process in this issue, if you run into a similar issue during the upgrade I would really appreciate if you could share the steps to reproduce. I've tried a few different ways of doing this and have yet to cause it to happen.
still running into the issue with upgrade, upgrading from 0.65.1 to 0.69
> helm --namespace otel-operator-system ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
opentelemetry-operator otel-operator-system 9 2024-09-10 12:16:27.747165 -0700 PDT deployed opentelemetry-operator-0.69.0 0.108.0
Collector still sees the featuregate
> kubectl -n otel-collector describe ds/otel-collector-collector | grep feature
--feature-gates=-confmap.unifyEnvVarExpansion,-component.UseLocalHostAsDefaultHost
Operator log
kubectl -n otel-operator-system logs po/opentelemetry-operator-595855cd5c-jx9hj -f
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","message":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.108.0","opentelemetry-collector":"ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-k8s:0.108.0","opentelemetry-targetallocator":"ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.108.0","operator-opamp-bridge":"ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:0.108.0","auto-instrumentation-java":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.33.5","auto-instrumentation-nodejs":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.52.1","auto-instrumentation-python":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.48b0","auto-instrumentation-dotnet":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.2.0","auto-instrumentation-go":"ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.14.0-alpha","auto-instrumentation-apache-httpd":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4","auto-instrumentation-nginx":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4","feature-gates":"-operator.golang.flags,operator.observability.prometheus","build-date":"2024-09-05T17:19:14Z","go-version":"go1.22.6","go-arch":"amd64","go-os":"linux","labels-filter":[],"annotations-filter":[],"enable-multi-instrumentation":false,"enable-apache-httpd-instrumentation":true,"enable-dotnet-instrumentation":true,"enable-go-instrumentation":false,"enable-python-instrumentation":true,"enable-nginx-instrumentation":false,"enable-nodejs-instrumentation":true,"enable-java-instrumentation":true,"create-openshift-dashboard":false,"zap-message-key":"message","zap-level-key":"level","zap-time-key":"timestamp","zap-level-format":"uppercase"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"setup","message":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"setup","message":"Prometheus CRDs are installed, adding to scheme."}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"setup","message":"Openshift CRDs are not installed, skipping adding to scheme."}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector","path":"/mutate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector","path":"/validate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/convert"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Conversion webhook enabled","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpAMPBridge","path":"/mutate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpAMPBridge","path":"/validate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"setup","message":"starting manager"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.metrics","message":"Starting metrics server"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","message":"starting server","name":"health probe","addr":"[::]:8081"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.metrics","message":"Serving metrics server","bindAddress":"0.0.0.0:8080","secure":false}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Starting webhook server"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.certwatcher","message":"Updated current TLS certificate"}
I0910 19:17:10.847314 1 leaderelection.go:254] attempting to acquire leader lease otel-operator-system/9f7554c3.opentelemetry.io...
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Serving webhook server","host":"","port":9443}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.certwatcher","message":"Starting certificate watcher"}
I0910 19:18:05.154198 1 leaderelection.go:268] successfully acquired lease otel-operator-system/9f7554c3.opentelemetry.io
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","logger":"collector-upgrade","message":"looking for managed instances to upgrade"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","logger":"instrumentation-upgrade","message":"looking for managed Instrumentation instances to upgrade"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1alpha1.OpAMPBridge"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.ConfigMap"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.ServiceAccount"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.Service"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1beta1.OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.Deployment"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting Controller","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Ingress"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodDisruptionBudget"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceMonitor"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodMonitor"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","logger":"instrumentation-upgrade","message":"no instances to upgrade"}
{"level":"INFO","timestamp":"2024-09-10T19:18:06Z","message":"Starting workers","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","worker count":1}
{"level":"INFO","timestamp":"2024-09-10T19:18:06Z","message":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}
if you do this, do you see the string 'managed'?
k get otelcol -n otel-collector otel-collector -o yaml | grep 'managementState'
managementState: managed
yes i do
> kubectl -n otel-collector get otelcol otel-collector -o yaml | grep 'managementState'
managementState: managed
I also experienced this. I had to delete and recreate the OpentelemetryCollector resource (type = sidecar, in my case).
otel collectors are failing after 0.69.0 upgrade
when i describe the container i can see
-confmap.unifyEnvVarExpansion
, looks like new version 0.108.0 does not like it