Open dacamposol opened 1 year ago
I did a quick look through the sidecar injection logic and I didn't see anything that stuck out at me as to why this wouldn't be working as you expected.
Does the OpenTelemetryCollector
object exist already when you bring the cluster out of hibernation? What do the operator logs show? I am hoping there is a failed to select an OpenTelemetry Collector instance for this pod's sidecar
log in there somewhere when this race condition happens.
I did a quick look through the sidecar injection logic and I didn't see anything that stuck out at me as to why this wouldn't be working as you expected.
Does the
OpenTelemetryCollector
object exist already when you bring the cluster out of hibernation? What do the operator logs show? I am hoping there is afailed to select an OpenTelemetry Collector instance for this pod's sidecar
log in there somewhere when this race condition happens.
Hello Tyler, thank you for your quick response.
The OpenTelemetryCollector
object gets deployed at the same time than the operator itself, and it should exist already when we are out of hibernation.
I have one Argo Application, let's call it Application A, that is an umbrella Chart with dependency on the chart for the Operator and some extra templates, which includes the secret for the environment defined in the OpenTelemetryCollector
resource. Additionally, I have another Argo Application, which is just a compilation of the required resources for deploying my application (a Service
, a Deployment
, some RBAC resources, etc...)
I don't see any error on the logs of the operator logs:
{"level":"info","ts":"2023-05-24T04:16:00Z","msg":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.76.1","opentelemetry-collector":"otel/opentelemetry-collector-contrib:0.76.1","opentelemetry-targetallocator":"ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.76.1","operator-opamp-bridge":"ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:0.76.1","auto-instrumentation-java":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.25.1","auto-instrumentation-nodejs":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.38.0","auto-instrumentation-python":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.38b0","auto-instrumentation-dotnet":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.7.0","feature-gates":"operator.autoinstrumentation.dotnet,operator.autoinstrumentation.java,operator.autoinstrumentation.nodejs,operator.autoinstrumentation.python,-operator.collector.rewritetargetallocator","build-date":"2023-05-09T13:57:45Z","go-version":"go1.20.4","go-arch":"amd64","go-os":"linux","labels-filter":[]}
{"level":"info","ts":"2023-05-24T04:16:00Z","logger":"setup","msg":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
I0524 04:16:01.754669 1 request.go:690] Waited for 1.047851876s due to client-side throttling, not priority and fairness, request: GET:https://api.d1eu1.dxp-d1.internal.canary.k8s.ondemand.com:443/apis/sql.cnrm.cloud.google.com/v1beta1?timeout=32s
{"level":"info","ts":"2023-05-24T04:16:05Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":"0.0.0.0:8080"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":"2023-05-24T04:16:06Z","msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
{"level":"info","ts":"2023-05-24T04:16:06Z","msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
I0524 04:16:06.066356 1 leaderelection.go:248] attempting to acquire leader lease monitoring/9f7554c3.opentelemetry.io...
I0524 04:19:14.429811 1 leaderelection.go:258] successfully acquired lease monitoring/9f7554c3.opentelemetry.io
{"level":"info","ts":"2023-05-24T04:19:14Z","logger":"instrumentation-upgrade","msg":"looking for managed Instrumentation instances to upgrade"}
{"level":"info","ts":"2023-05-24T04:19:14Z","logger":"collector-upgrade","msg":"looking for managed instances to upgrade"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1alpha1.OpenTelemetryCollector"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"info","ts":"2023-05-24T04:19:14Z","logger":"instrumentation-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":"2023-05-24T04:19:14Z","logger":"collector-upgrade","msg":"skipping upgrade for OpenTelemetry Collector instance, as it's newer than our latest version","name":"sidecar-collector","namespace":"dxp-system","version":"0.76.1","latest":"0.61.0"}
{"level":"info","ts":"2023-05-24T04:19:17Z","msg":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}
When I checked the status of my Pod in ArgoCD (after some hours), I noticed that there were only 2 containers within (the container with the application and the envoy proxy). Once I deleted manually the Pod, it got recreated without any issue, and the sidecar was properly injected, even though there were some error messages in the operator logs:
{"level":"info","ts":"2023-05-24T13:23:18Z","msg":"couldn't determine metrics port from configuration, using 8888 default value","error":"missing port in address"}
{"level":"error","ts":"2023-05-24T13:23:18Z","msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.Container\n\t/workspace/pkg/collector/container.go:127\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add\n\t/workspace/pkg/sidecar/pod.go:43\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/workspace/pkg/sidecar/podmutator.go:100\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle\n\t/workspace/internal/webhookhandler/webhookhandler.go:92\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/webhook/admission/http.go:98\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:146\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:108\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2500\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}
If you say that there isn't any logic which prevents "old" Pods to be restarted upon readiness of the operator, the only other thing that I could think of, is that there is a dependency on a secret in order to populate the environment of the sidecar... Maybe the Secret wasn't ready at that point and instead of failing the deployment, it just skipped to inject the sidecar and didn't try anymore?
I'm not sure, this is gonna take more digging into.
@TylerHelmuth regarding my last proposition, it's not that: I can verify there is no problem with the Secret
not being ready.
I performed a test where I deployed a different OpenTelemetryCollector
in an isolated namespace, referring to a non-existent Secret
.
When I deployed the following example Pod:
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: telemetry-poc
annotations:
sidecar.opentelemetry.io/inject: "true"
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
restartPolicy: Always
The collector was successfully injected as a sidecar, but it just returns a CreateContainerConfigError
since it cannot find the referred secret, but that's expected.
My problem is that the sidecar doesn't even get injected. I'm going to perform further testing.
Additional Info:
I scaled down the replicas of the operator to zero, and I redeployed the aforementioned busybox
Pod. Once the Pod was ready (1/1 containers running), I scaled up the operator:
The Pod, even having the correct annotation, doesn't get recreated and the sidecar is never injected. The sidecar is only injected once I recreate the Pod.
I'm using the following image of the operator:
ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:v0.76.1
@TylerHelmuth I found the issue.
Istio prevents the Pods of being created before the admission webhook is ready through a fail-close configuration in the MutatingWebhookConfiguration
, where they specify the .failurePolicy: Fail
. If the istio-injector-sidecar
isn't ready, the the Pod won't be created[^1]:
Internal error occurred: failed calling admission webhook "istio-sidecar-injector.istio.io": \
Post https://istio-sidecar-injector.istio-system.svc:443/admitPilot?timeout=30s: \
no endpoints available for service "istio-sidecar-injector"
When we deploy the opentelemetry-operator
Chart, there is the .admissionWebhooks.failurePolicy
value, which by default is set to Fail
, but the problem is that is not being taken into account for Pod validation. No matter what we configure, the Chart will set the .failurePolicy
to Ignore
.
From my personal point-of-view, I'd expect the user who deploys the Chart to have full decisional-capability on the failurePolicy
of the webhooks, but I didn't create a PR yet since I'm not aware if that was some kind of architectural decision from your side.
If not, please let me know and I'll be glad to create a fix.
It's always istio lol good find
Hi I am experiencing this issue as well, and I am not using Istio. I have tried to set admissionWebhooks.pods.failurePolicy
to Fail
, but then the operator will never start. Is this issue simply because of the order of applying?
This is the value of my chart:
helmCharts:
- name: opentelemetry-operator
namespace: opentelemetry
releaseName: opentelemetry-operator
includeCRDs: true
version: 0.31.0
repo: https://open-telemetry.github.io/opentelemetry-helm-charts
valuesInline:
manager:
serviceMonitor:
enabled: true
prometheusRule:
enabled: true
defaultRules:
enabled: true
kubeRBACProxy:
enabled: false
admissionWebhooks:
pods:
failurePolicy: Ignore
certManager:
enabled: true
create: true
autoGenerateCert: false
@winston0410 that's weird, except if you marked the operator itself to be injected with a collector sidecar (either by the annotation in the Namespace or in the Pod of the operator itself). Since the logic is part of a MutatingWebhookConfiguration
, it could be that what's preventing the operator to start.
Out of curiosity, and for the sake of testing it out, could you provide here the manifest of the operator's Pod and Namespace?
Sure, this is the generated manifests excluding CRDs:
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator
namespace: opentelemetry
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-leader-election
namespace: opentelemetry
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- ""
resources:
- configmaps/status
verbs:
- get
- update
- patch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-manager
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- namespaces
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- services
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- daemonsets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- deployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- get
- list
- update
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- opentelemetry.io
resources:
- instrumentations
verbs:
- get
- list
- patch
- update
- watch
- apiGroups:
- opentelemetry.io
resources:
- opentelemetrycollectors
verbs:
- get
- list
- patch
- update
- watch
- apiGroups:
- opentelemetry.io
resources:
- opentelemetrycollectors/finalizers
verbs:
- get
- patch
- update
- apiGroups:
- opentelemetry.io
resources:
- opentelemetrycollectors/status
verbs:
- get
- patch
- update
- apiGroups:
- route.openshift.io
resources:
- routes
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-leader-election
namespace: opentelemetry
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: opentelemetry-operator-leader-election
subjects:
- kind: ServiceAccount
name: opentelemetry-operator
namespace: opentelemetry
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-manager
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: opentelemetry-operator-manager
subjects:
- kind: ServiceAccount
name: opentelemetry-operator
namespace: opentelemetry
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator
namespace: opentelemetry
spec:
ports:
- name: metrics
port: 8080
protocol: TCP
targetPort: metrics
selector:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/name: opentelemetry-operator
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-webhook
namespace: opentelemetry
spec:
ports:
- port: 443
protocol: TCP
targetPort: webhook-server
selector:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/name: opentelemetry-operator
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator
namespace: opentelemetry
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/name: opentelemetry-operator
template:
metadata:
annotations:
kubectl.kubernetes.io/default-container: manager
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/name: opentelemetry-operator
spec:
containers:
- args:
- --metrics-addr=0.0.0.0:8080
- --enable-leader-election
- --health-probe-addr=:8081
- --webhook-port=9443
- --collector-image=otel/opentelemetry-collector-contrib:0.78.0
command:
- /manager
env:
- name: ENABLE_WEBHOOKS
value: "true"
image: ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:v0.78.0
livenessProbe:
httpGet:
path: /healthz
port: 8081
initialDelaySeconds: 15
periodSeconds: 20
name: manager
ports:
- containerPort: 8080
name: metrics
protocol: TCP
- containerPort: 9443
name: webhook-server
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 64Mi
volumeMounts:
- mountPath: /tmp/k8s-webhook-server/serving-certs
name: cert
readOnly: true
hostNetwork: false
securityContext:
fsGroup: 65532
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
serviceAccountName: opentelemetry-operator
terminationGracePeriodSeconds: 10
volumes:
- name: cert
secret:
defaultMode: 420
secretName: opentelemetry-operator-controller-manager-service-cert
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
labels:
app.kubernetes.io/component: webhook
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-serving-cert
namespace: opentelemetry
spec:
dnsNames:
- opentelemetry-operator-webhook.opentelemetry.svc
- opentelemetry-operator-webhook.opentelemetry.svc.cluster.local
issuerRef:
kind: Issuer
name: opentelemetry-operator-selfsigned-issuer
secretName: opentelemetry-operator-controller-manager-service-cert
subject:
organizationalUnits:
- opentelemetry-operator
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
labels:
app.kubernetes.io/component: webhook
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-selfsigned-issuer
namespace: opentelemetry
spec:
selfSigned: {}
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator
namespace: opentelemetry
spec:
groups:
- name: managerRules
rules:
- alert: ReconcileErrors
annotations:
description: 'Reconciliation errors for {{ $labels.controller }} is increasing
and has now reached {{ humanize $value }} '
runbook_url: Check manager logs for reasons why this might happen
expr: rate(controller_runtime_reconcile_total{result="error"}[5m]) > 0
for: 5m
labels:
severity: warning
- alert: WorkqueueDepth
annotations:
description: 'Queue depth for {{ $labels.name }} has reached {{ $value }} '
runbook_url: Check manager logs for reasons why this might happen
expr: workqueue_depth > 0
for: 5m
labels:
severity: warning
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator
namespace: opentelemetry
spec:
endpoints:
- port: metrics
namespaceSelector:
matchNames:
- opentelemetry
selector:
matchLabels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/name: opentelemetry-operator
---
apiVersion: v1
kind: Pod
metadata:
annotations:
helm.sh/hook: test
labels:
app.kubernetes.io/component: webhook
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-cert-manager
namespace: opentelemetry
spec:
containers:
- command:
- sh
- -c
- |
wget_output=$(wget -q "$CERT_MANAGER_CLUSTERIP:$CERT_MANAGER_PORT")
if wget_output=="wget: server returned error: HTTP/1.0 400 Bad Request"
then exit 0
else exit 1
fi
env:
- name: CERT_MANAGER_CLUSTERIP
value: cert-manager-webhook
- name: CERT_MANAGER_PORT
value: "443"
image: busybox:latest
name: wget
restartPolicy: Never
---
apiVersion: v1
kind: Pod
metadata:
annotations:
helm.sh/hook: test
labels:
app.kubernetes.io/component: controller-manager
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-webhook
namespace: opentelemetry
spec:
containers:
- command:
- sh
- -c
- |
wget_output=$(wget -q "$WEBHOOK_SERVICE_CLUSTERIP:$WEBHOOK_SERVICE_PORT")
if wget_output=="wget: server returned error: HTTP/1.0 400 Bad Request"
then exit 0
else exit 1
fi
env:
- name: WEBHOOK_SERVICE_CLUSTERIP
value: opentelemetry-operator-webhook
- name: WEBHOOK_SERVICE_PORT
value: "443"
image: busybox:latest
name: wget
restartPolicy: Never
---
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
annotations:
cert-manager.io/inject-ca-from: opentelemetry/opentelemetry-operator-serving-cert
labels:
app.kubernetes.io/component: webhook
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-mutation
webhooks:
- admissionReviewVersions:
- v1
clientConfig:
service:
name: opentelemetry-operator-webhook
namespace: opentelemetry
path: /mutate-opentelemetry-io-v1alpha1-instrumentation
failurePolicy: Fail
name: minstrumentation.kb.io
rules:
- apiGroups:
- opentelemetry.io
apiVersions:
- v1alpha1
operations:
- CREATE
- UPDATE
resources:
- instrumentations
sideEffects: None
timeoutSeconds: 10
- admissionReviewVersions:
- v1
clientConfig:
service:
name: opentelemetry-operator-webhook
namespace: opentelemetry
path: /mutate-opentelemetry-io-v1alpha1-opentelemetrycollector
failurePolicy: Fail
name: mopentelemetrycollector.kb.io
rules:
- apiGroups:
- opentelemetry.io
apiVersions:
- v1alpha1
operations:
- CREATE
- UPDATE
resources:
- opentelemetrycollectors
sideEffects: None
timeoutSeconds: 10
- admissionReviewVersions:
- v1
clientConfig:
service:
name: opentelemetry-operator-webhook
namespace: opentelemetry
path: /mutate-v1-pod
failurePolicy: Ignore
name: mpod.kb.io
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- pods
sideEffects: None
timeoutSeconds: 10
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
annotations:
cert-manager.io/inject-ca-from: opentelemetry/opentelemetry-operator-serving-cert
labels:
app.kubernetes.io/component: webhook
app.kubernetes.io/instance: opentelemetry-operator
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opentelemetry-operator
app.kubernetes.io/version: 0.78.0
helm.sh/chart: opentelemetry-operator-0.31.0
name: opentelemetry-operator-validation
webhooks:
- admissionReviewVersions:
- v1
clientConfig:
service:
name: opentelemetry-operator-webhook
namespace: opentelemetry
path: /validate-opentelemetry-io-v1alpha1-instrumentation
failurePolicy: Fail
name: vinstrumentationcreateupdate.kb.io
rules:
- apiGroups:
- opentelemetry.io
apiVersions:
- v1alpha1
operations:
- CREATE
- UPDATE
resources:
- instrumentations
sideEffects: None
timeoutSeconds: 10
- admissionReviewVersions:
- v1
clientConfig:
service:
name: opentelemetry-operator-webhook
namespace: opentelemetry
path: /validate-opentelemetry-io-v1alpha1-instrumentation
failurePolicy: Ignore
name: vinstrumentationdelete.kb.io
rules:
- apiGroups:
- opentelemetry.io
apiVersions:
- v1alpha1
operations:
- DELETE
resources:
- instrumentations
sideEffects: None
timeoutSeconds: 10
- admissionReviewVersions:
- v1
clientConfig:
service:
name: opentelemetry-operator-webhook
namespace: opentelemetry
path: /validate-opentelemetry-io-v1alpha1-opentelemetrycollector
failurePolicy: Fail
name: vopentelemetrycollectorcreateupdate.kb.io
rules:
- apiGroups:
- opentelemetry.io
apiVersions:
- v1alpha1
operations:
- CREATE
- UPDATE
resources:
- opentelemetrycollectors
sideEffects: None
timeoutSeconds: 10
- admissionReviewVersions:
- v1
clientConfig:
service:
name: opentelemetry-operator-webhook
namespace: opentelemetry
path: /validate-opentelemetry-io-v1alpha1-opentelemetrycollector
failurePolicy: Ignore
name: vopentelemetrycollectordelete.kb.io
rules:
- apiGroups:
- opentelemetry.io
apiVersions:
- v1alpha1
operations:
- DELETE
resources:
- opentelemetrycollectors
sideEffects: None
timeoutSeconds: 10
The helmchart manifest
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: opentelemetry
resources:
- ./resources/index.yaml # this simply create the namespace
helmCharts:
- name: opentelemetry-operator
namespace: opentelemetry
releaseName: opentelemetry-operator
includeCRDs: true
version: 0.31.0
repo: https://open-telemetry.github.io/opentelemetry-helm-charts
valuesInline:
manager:
serviceMonitor:
enabled: true
prometheusRule:
enabled: true
defaultRules:
enabled: true
kubeRBACProxy:
enabled: false
admissionWebhooks:
pods:
# REF https://github.com/open-telemetry/opentelemetry-operator/issues/1765
failurePolicy: Ignore
certManager:
enabled: true
create: true
autoGenerateCert: false
Hi @dacamposol, are you willing to send a fix for that issue?
@yuriolisa I'm investigating what could be the reason of the Operator not starting up normally when the injection isn't there.
I'll update with my findings.
@winston0410 I found the problem, it's just a misconfiguration on your files.
I noticed, that you're using the default MutatingWebhookConfiguration, without setting a proper .objectSelector
on it.
Concretely, the problem is in the following section:
- admissionReviewVersions:
- v1
clientConfig:
service:
name: opentelemetry-operator-webhook
namespace: opentelemetry
path: /mutate-v1-pod
failurePolicy: Ignore
name: mpod.kb.io
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- pods
sideEffects: None
timeoutSeconds: 10
As you can see, the Mutation operation is trying to happen in all the Pods in the system, not only in the ones you want to actually inject the sidecar.
I don't know how it works with Kustomize, but in Helm you would just have to set the values of the .admissionWebhooks
with a proper .objectSelector
.
For example, let's say that you're deploying your OpenTelemetry Operator in the monitoring
namespace, then you'd have to add something like:
admissionWebhooks:
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values:
- kube-system
- monitoring
In this way, you prevent resources of the kube-system
namespace to wait for the admission webhook, and your operator itself as well.
Anyway, @yuriolisa, I don't have much time right now, but I'd like to propose the possibility of adding different objectSelector
to the different resources in the Chart. While it's not directly related to this issue, I think it would offer a lot of flexibility that we can define an objectSelector
for the pods
different than the one for the opentelemetrycollector
resources.
If not implemented, I'll try to do it in the future.
I think this is a flavor of https://github.com/open-telemetry/opentelemetry-operator/issues/1329
{"level":"ERROR","timestamp":"2024-11-10T03:17:43.458903251Z","message":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(sidecarPodMutator).Mutate\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/sidecar/podmutator.go:84\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhook/podmutation.(podMutationWebhook).Handle\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/webhook/podmutation/webhookhandler.go:93\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(Webhook).Handle\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/webhook/admission/webhook.go:181\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(Webhook).ServeHTTP\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\nnet/http.(ServeMux).ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2688\nnet/http.serverHandler.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:3142\nnet/http.(conn).serve\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2044"} {"level":"ERROR","timestamp":"2024-11-10T03:17:47.414812688Z","message":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(sidecarPodMutator).Mutate\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/sidecar/podmutator.go:84\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhook/podmutation.(podMutationWebhook).Handle\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/webhook/podmutation/webhookhandler.go:93\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(Webhook).Handle\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/webhook/admission/webhook.go:181\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(Webhook).ServeHTTP\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\nnet/http.(ServeMux).ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2688\nnet/http.serverHandler.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:3142\nnet/http.(conn).serve\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2044"} {"level":"ERROR","timestamp":"2024-11-10T03:17:49.759572162Z","message":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(sidecarPodMutator).Mutate\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/sidecar/podmutator.go:84\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhook/podmutation.(podMutationWebhook).Handle\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/webhook/podmutation/webhookhandler.go:93\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(Webhook).Handle\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/webhook/admission/webhook.go:181\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(Webhook).ServeHTTP\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\nnet/http.(ServeMux).ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2688\nnet/http.serverHandler.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:3142\nnet/http.(conn).serve\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2044"} {"level":"INFO","timestamp":"2024-11-10T03:22:23.832875828Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"} {"level":"ERROR","timestamp":"2024-11-10T04:02:28.930318531Z","message":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(sidecarPodMutator).Mutate\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/sidecar/podmutator.go:84\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhook/podmutation.(podMutationWebhook).Handle\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/webhook/podmutation/webhookhandler.go:93\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(Webhook).Handle\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/webhook/admission/webhook.go:181\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(Webhook).ServeHTTP\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.20.3/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\nnet/http.(ServeMux).ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2688\nnet/http.serverHandler.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:3142\nnet/http.(conn).serve\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2044"} Logs from 2024年11月9日 to 2024年11月10日 UTC
OpenTelemetryCollector I clearly have this resource, why is it still an error? Can't find this resource?
Hello everyone,
First of all, thank you for the effort put into the Operator, as it is a really useful tool to enhance the instrumentation in Kubernetes environments.
I am currently using the OpenTelemetryCollector as a sidecar, with the following configuration:
The target Deployments have the annotation
sidecar.opentelemetry.io/inject: "true"
in the.template.spec
, and overall it works without any issues. However, I seem to have a racing condition where if the Deployment initializes the Pods before the OpenTelemetry Operator is ready, then the target Pods will never have the sidecar injected. This issue only occurs when I wake up my cluster after hibernation.I noticed that the Istio Operator recreates the Pod once it's ready, correctly injecting the envoy proxies as sidecar in all the Pods with the corresponding annotation.
Is there any way to tell the OpenTelemetry Operator to recreate any Pod which doesn't have the sidecar already injected when it analyzes the different Pods and Namespaces on startup in search of the
inject
annotation?