Closed avijitsarkar123 closed 1 year ago
Hi @avijitsarkar123 could you try with a newer version?
@iblancasa - I upgraded my opentelemetry-operator
to the latest version but still getting the same error.. the versions are now:
"opentelemetry-operator":"0.80.0","opentelemetry-collector":"otel/opentelemetry-collector-contrib:0.80.0"
Am I missing something in my otel-collector yaml that causing this error?
"msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions",
{"level":"info","ts":"2023-07-10T14:47:11Z","msg":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.80.0","opentelemetry-collector":"otel/opentelemetry-collector-contrib:0.80.0","opentelemetry-targetallocator":"ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.80.0","operator-opamp-bridge":"ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:0.80.0","auto-instrumentation-java":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.26.0","auto-instrumentation-nodejs":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.40.0","auto-instrumentation-python":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.39b0","auto-instrumentation-dotnet":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.7.0","auto-instrumentation-go":"ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.2.1-alpha","auto-instrumentation-apache-httpd":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3","feature-gates":"operator.autoinstrumentation.apache-httpd,operator.autoinstrumentation.dotnet,-operator.autoinstrumentation.go,operator.autoinstrumentation.java,operator.autoinstrumentation.nodejs,operator.autoinstrumentation.python,-operator.collector.rewritetargetallocator","build-date":"2023-06-28T17:26:24Z","go-version":"go1.20.5","go-arch":"amd64","go-os":"linux","labels-filter":[]}
{"level":"info","ts":"2023-07-10T14:47:11Z","logger":"setup","msg":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
{"level":"info","ts":"2023-07-10T14:47:11Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":"0.0.0.0:8080"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2023-07-10T14:47:12Z","msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":"2023-07-10T14:47:12Z","msg":"starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
I0710 14:47:12.150869 1 leaderelection.go:245] attempting to acquire leader lease kaizen-system/9f7554c3.opentelemetry.io...
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
{"level":"info","ts":"2023-07-10T14:47:43Z","msg":"couldn't determine metrics port from configuration, using 8888 default value","error":"missing port in address"}
{"level":"error","ts":"2023-07-10T14:47:43Z","msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.Container\n\t/workspace/pkg/collector/container.go:127\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add\n\t/workspace/pkg/sidecar/pod.go:43\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/workspace/pkg/sidecar/podmutator.go:100\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle\n\t/workspace/internal/webhookhandler/webhookhandler.go:92\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/http.go:98\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2500\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}
"msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions",
This just informs you your OTEL Config doesn't contain any extensions. Should not be a problem for injecting the sidecar.
After reading your configuration again... is the log message the only problem you see or there is something else like the instrumentation is not working or something?
So the sidecar container is never coming up and showing the error "context deadline exceeded", I tried checking the kubelet log of GKE to see if it has any additional details.. there also I am just having below...
"MESSAGE": "E0710 15:13:20.346185 1872 pod_workers.go:965] \"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"otc-container\\\" with RunContainerError: \\\"context deadline exceeded\\\"\" pod=\"core-service-corpdirectory-1/core-service-corpdirectory-1-core-service-corpdirectory-76bwbct\" podUID=403fd05c-5ec1-4267-8361-0b2d7de46302"
Is there a way to have more verbose logging for the operator to get any additional details why the sidecar isn't coming up?
This same setup works fine in local Kind cluster but on GKE its failing...
Try enabling these options:
--zap-log-level level Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
--zap-stacktrace-level level Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic')
So after enabling logging as mentioned above, I do see an error (the last one)
{"level":"info","ts":"2023-07-10T16:35:30Z","msg":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1,"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:219\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:233\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/manager/runnable_group.go:219"}
{"level":"debug","ts":"2023-07-10T16:35:43Z","msg":"injecting sidecar into pod","namespace":"core-service-corpdirectory-1","name":"","otelcol-namespace":"core-service-corpdirectory-1","otelcol-name":"otel-collector"}
{"level":"info","ts":"2023-07-10T16:35:43Z","msg":"couldn't determine metrics port from configuration, using 8888 default value","error":"missing port in address","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.getConfigContainerPorts\n\t/workspace/pkg/collector/container.go:181\ngithub.com/open-telemetry/opentelemetry-operator/pkg/collector.Container\n\t/workspace/pkg/collector/container.go:46\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add\n\t/workspace/pkg/sidecar/pod.go:43\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/workspace/pkg/sidecar/podmutator.go:100\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle\n\t/workspace/internal/webhookhandler/webhookhandler.go:92\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/http.go:98\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2500\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}
{"level":"error","ts":"2023-07-10T16:35:43Z","msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.Container\n\t/workspace/pkg/collector/container.go:127\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add\n\t/workspace/pkg/sidecar/pod.go:43\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/workspace/pkg/sidecar/podmutator.go:100\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle\n\t/workspace/internal/webhookhandler/webhookhandler.go:92\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/http.go:98\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2500\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}
{"level":"debug","ts":"2023-07-10T16:35:43Z","msg":"annotation not present in deployment, skipping instrumentation injection","namespace":"core-service-corpdirectory-1","name":""}
My generated pod spec is below
apiVersion: v1
kind: Pod
metadata:
name: core-service-corpdirectory-1-core-service-corpdirectory-76xrcsk
generateName: core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565-
namespace: core-service-corpdirectory-1
uid: 4e6ba87c-0d59-4400-aea2-80feaf5d7b5c
resourceVersion: '2779437'
creationTimestamp: '2023-07-10T16:35:43Z'
labels:
app: core-service-corpdirectory
app.kubernetes.io/instance: core-service-corpdirectory-1
app.kubernetes.io/name: core-service-corpdirectory-1
pod-template-hash: 76fbccc565
sidecar.opentelemetry.io/injected: core-service-corpdirectory-1.otel-collector
annotations:
cni.projectcalico.org/containerID: adec446a30049b6b3577057db9eb5228ba51ff71bfc9bf86ff8bcda07c45e330
cni.projectcalico.org/podIP: 240.16.0.25/32
cni.projectcalico.org/podIPs: 240.16.0.25/32
sidecar.opentelemetry.io/inject: 'true'
ownerReferences:
- apiVersion: apps/v1
kind: ReplicaSet
name: core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565
uid: 5fed46ef-5fc2-4408-809d-1f86d9e8963f
controller: true
blockOwnerDeletion: true
managedFields:
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: '2023-07-10T16:35:43Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:sidecar.opentelemetry.io/inject: {}
f:generateName: {}
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/instance: {}
f:app.kubernetes.io/name: {}
f:pod-template-hash: {}
f:ownerReferences:
.: {}
k:{"uid":"5fed46ef-5fc2-4408-809d-1f86d9e8963f"}: {}
f:spec:
f:containers:
k:{"name":"api"}:
.: {}
f:args: {}
f:env:
.: {}
k:{"name":"APPLE_SYSTEM_ACCOUNT_ID"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef: {}
k:{"name":"APPLE_SYSTEM_ACCOUNT_NAME"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef: {}
k:{"name":"APPLE_SYSTEM_ACCOUNT_PASSWORD"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef: {}
k:{"name":"APPLE_SYSTEM_ACCOUNT_TOTP_SECRET"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef: {}
k:{"name":"APP_ID_KEY"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef: {}
k:{"name":"APP_PASSWORD"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef: {}
k:{"name":"ENABLE_REFLECTION"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"KUBERNETES_CLUSTER_DOMAIN"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"LISTEN_ADDR"}:
.: {}
f:name: {}
f:value: {}
f:envFrom: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources:
.: {}
f:limits:
.: {}
f:cpu: {}
f:memory: {}
f:requests:
.: {}
f:cpu: {}
f:memory: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:serviceAccount: {}
f:serviceAccountName: {}
f:terminationGracePeriodSeconds: {}
- manager: calico
operation: Update
apiVersion: v1
time: '2023-07-10T16:35:44Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:cni.projectcalico.org/containerID: {}
f:cni.projectcalico.org/podIP: {}
f:cni.projectcalico.org/podIPs: {}
subresource: status
- manager: kubelet
operation: Update
apiVersion: v1
time: '2023-07-10T16:37:50Z'
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"ContainersReady"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Initialized"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:containerStatuses: {}
f:hostIP: {}
f:podIP: {}
f:podIPs:
.: {}
k:{"ip":"240.16.0.25"}:
.: {}
f:ip: {}
f:startTime: {}
subresource: status
selfLink: >-
/api/v1/namespaces/core-service-corpdirectory-1/pods/core-service-corpdirectory-1-core-service-corpdirectory-76xrcsk
status:
phase: Pending
conditions:
- type: Initialized
status: 'True'
lastProbeTime: null
lastTransitionTime: '2023-07-10T16:35:43Z'
- type: Ready
status: 'False'
lastProbeTime: null
lastTransitionTime: '2023-07-10T16:35:43Z'
reason: ContainersNotReady
message: 'containers with unready status: [otc-container]'
- type: ContainersReady
status: 'False'
lastProbeTime: null
lastTransitionTime: '2023-07-10T16:35:43Z'
reason: ContainersNotReady
message: 'containers with unready status: [otc-container]'
- type: PodScheduled
status: 'True'
lastProbeTime: null
lastTransitionTime: '2023-07-10T16:35:43Z'
hostIP: 198.19.63.203
podIP: 240.16.0.25
podIPs:
- ip: 240.16.0.25
startTime: '2023-07-10T16:35:43Z'
containerStatuses:
- name: api
state:
running:
startedAt: '2023-07-10T16:35:50Z'
lastState: {}
ready: true
restartCount: 0
image: docker.apple.com/avijit_sarkar/core-corpdirectory-service:latest1
imageID: >-
docker.apple.com/avijit_sarkar/core-corpdirectory-service@sha256:e8f229aac2e7f0b93b929823d967cb561067acc40eeb133b62fd7c1b1d2931db
containerID: >-
containerd://8ab321a3e22f904ad12de21846b53d23e677a34be7084c0fbc72e1146ca80f99
started: true
- name: otc-container
state:
waiting:
reason: RunContainerError
message: context deadline exceeded
lastState: {}
ready: false
restartCount: 1
image: docker-upstream.apple.com/otel/opentelemetry-collector-contrib:latest
imageID: >-
docker-upstream.apple.com/otel/opentelemetry-collector-contrib@sha256:c6671841470b83007e0553cdadbc9d05f6cfe17b3ebe9733728dc4a579a5b532
containerID: >-
containerd://e162f6f5199f68b81821da1de7e329b3e464aaf6396ad3abae20f00b8b3787c4
started: false
qosClass: Burstable
spec:
volumes:
- name: kube-api-access-wwqc5
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
defaultMode: 420
containers:
- name: api
image: docker.apple.com/avijit_sarkar/core-corpdirectory-service:latest1
args:
- serve
envFrom:
- configMapRef:
name: core-service-corpdirectory-1-cloud-provider-config
env:
- name: ENABLE_REFLECTION
value: 'true'
- name: APPLE_SYSTEM_ACCOUNT_NAME
valueFrom:
secretKeyRef:
name: core-service-corpdirectory-1-corp-service-secret
key: APPLE_SYSTEM_ACCOUNT_NAME
- name: APPLE_SYSTEM_ACCOUNT_PASSWORD
valueFrom:
secretKeyRef:
name: core-service-corpdirectory-1-corp-service-secret
key: APPLE_SYSTEM_ACCOUNT_PASSWORD
- name: APPLE_SYSTEM_ACCOUNT_ID
valueFrom:
secretKeyRef:
name: core-service-corpdirectory-1-corp-service-secret
key: APPLE_SYSTEM_ACCOUNT_ID
- name: APPLE_SYSTEM_ACCOUNT_TOTP_SECRET
valueFrom:
secretKeyRef:
name: core-service-corpdirectory-1-corp-service-secret
key: APPLE_SYSTEM_ACCOUNT_TOTP_SECRET
- name: APP_PASSWORD
valueFrom:
secretKeyRef:
name: core-service-corpdirectory-1-corp-service-secret
key: APP_PASSWORD
- name: APP_ID_KEY
valueFrom:
secretKeyRef:
name: core-service-corpdirectory-1-corp-service-secret
key: APP_ID_KEY
- name: LISTEN_ADDR
value: ':50051'
- name: KUBERNETES_CLUSTER_DOMAIN
value: cluster.local
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 10m
memory: 32Mi
volumeMounts:
- name: kube-api-access-wwqc5
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
- name: otc-container
image: docker-upstream.apple.com/otel/opentelemetry-collector-contrib
args:
- '--config=env:OTEL_CONFIG'
ports:
- name: metrics
containerPort: 8888
protocol: TCP
- name: otlp-grpc
containerPort: 4317
protocol: TCP
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OTEL_CONFIG
value: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
k8s_cluster:
collection_interval: 10s
processors:
# groupbyattrs:
# keys:
# - namespace
# - cluster
# - location
# batch:
# # batch metrics before sending to reduce API usage
# send_batch_max_size: 100
# send_batch_size: 100
# timeout: 5s
# memory_limiter:
# # drop metrics if memory usage gets too high
# check_interval: 1s
# limit_percentage: 85
# spike_limit_percentage: 20
# resourcedetection:
# # detect cluster name and location
# detectors: [gcp]
# timeout: 2s
# override: false
exporters:
logging:
verbosity: Detailed
# googlemanagedprometheus:
# project: ct-gcp-sre-monitorin-dev-09qb
service:
telemetry:
logs:
level: "debug"
pipelines:
metrics:
receivers: [otlp, k8s_cluster]
# processors: [resourcedetection, groupbyattrs, batch, memory_limiter]
processors: []
exporters: [logging]
- name: OTEL_RESOURCE_ATTRIBUTES_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OTEL_RESOURCE_ATTRIBUTES_POD_UID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: OTEL_RESOURCE_ATTRIBUTES
value: >-
k8s.deployment.name=core-service-corpdirectory-1-core-service-corpdirectory,k8s.deployment.uid=53e2ea8e-44cd-4265-aba2-11edbe830e45,k8s.namespace.name=core-service-corpdirectory-1,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.pod.uid=$(OTEL_RESOURCE_ATTRIBUTES_POD_UID),k8s.replicaset.name=core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565,k8s.replicaset.uid=5fed46ef-5fc2-4408-809d-1f86d9e8963f
resources:
limits:
cpu: 10m
memory: 256Mi
requests:
cpu: 10m
memory: 256Mi
volumeMounts:
- name: kube-api-access-wwqc5
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: core-service-corpdirectory-sa
serviceAccount: core-service-corpdirectory-sa
nodeName: gke-test-gcp-us-west-test-gcp-uswe1-n-7a8f0692-79o7
securityContext: {}
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priority: 0
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority
My deployment.yaml
(helm template) is below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "core-service-corpdirectory-1.fullname" . }}-core-service-corpdirectory
labels:
app: core-service-corpdirectory
{{- include "core-service-corpdirectory-1.labels" . | nindent 4 }}
spec:
selector:
matchLabels:
app: core-service-corpdirectory
{{- include "core-service-corpdirectory-1.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
app: core-service-corpdirectory
{{- include "core-service-corpdirectory-1.selectorLabels" . | nindent 8 }}
annotations:
sidecar.opentelemetry.io/inject: "true"
spec:
serviceAccountName: core-service-corpdirectory-sa
containers:
- args: {{- toYaml .Values.coreServiceCorpdirectory.api.args | nindent 8 }}
env:
- name: ENABLE_REFLECTION
value: {{ quote .Values.coreServiceCorpdirectory.api.env.enableReflection }}
- name: APPLE_SYSTEM_ACCOUNT_NAME
valueFrom:
secretKeyRef:
key: APPLE_SYSTEM_ACCOUNT_NAME
name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
- name: APPLE_SYSTEM_ACCOUNT_PASSWORD
valueFrom:
secretKeyRef:
key: APPLE_SYSTEM_ACCOUNT_PASSWORD
name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
- name: APPLE_SYSTEM_ACCOUNT_ID
valueFrom:
secretKeyRef:
key: APPLE_SYSTEM_ACCOUNT_ID
name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
- name: APPLE_SYSTEM_ACCOUNT_TOTP_SECRET
valueFrom:
secretKeyRef:
key: APPLE_SYSTEM_ACCOUNT_TOTP_SECRET
name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
- name: APP_PASSWORD
valueFrom:
secretKeyRef:
key: APP_PASSWORD
name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
- name: APP_ID_KEY
valueFrom:
secretKeyRef:
key: APP_ID_KEY
name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
- name: LISTEN_ADDR
value: {{ quote .Values.coreServiceCorpdirectory.api.env.listenAddr }}
- name: KUBERNETES_CLUSTER_DOMAIN
value: {{ quote .Values.kubernetesClusterDomain }}
envFrom:
- configMapRef:
name: {{ include "core-service-corpdirectory-1.fullname" . }}-cloud-provider-config
image: {{ .Values.coreServiceCorpdirectory.api.image.repository }}:{{ .Values.coreServiceCorpdirectory.api.image.tag
| default .Chart.AppVersion }}
imagePullPolicy: {{ .Values.coreServiceCorpdirectory.api.imagePullPolicy }}
name: api
resources: {{- toYaml .Values.coreServiceCorpdirectory.api.resources | nindent 10 }}
@iblancasa - based on above log it seems the sidecar injection is working as expected, the injected container (image: otel/opentelemetry-collector-contrib
) just not coming up, can we enable debug tracing for that container?
Did you check the output for describe
?
yes nothing much there..
>>> kc describe pods core-service-corpdirectory-1-core-service-corpdirectory-762xcrw
+ kubectl describe pods core-service-corpdirectory-1-core-service-corpdirectory-762xcrw
Name: core-service-corpdirectory-1-core-service-corpdirectory-762xcrw
Namespace: core-service-corpdirectory-1
Priority: 0
Node: gke-test-gcp-us-west-test-gcp-uswe1-n-bf30fd8f-90se/198.19.63.202
Start Time: Mon, 10 Jul 2023 12:35:32 -0500
Labels: app=core-service-corpdirectory
app.kubernetes.io/instance=core-service-corpdirectory-1
app.kubernetes.io/name=core-service-corpdirectory-1
pod-template-hash=76fbccc565
sidecar.opentelemetry.io/injected=core-service-corpdirectory-1.otel-collector
Annotations: cni.projectcalico.org/containerID: f5ad5fb9c7544342aa09bd8a4a704dfc0ec11acf3910ebaa00afabb96f75ec18
cni.projectcalico.org/podIP: 240.16.1.25/32
cni.projectcalico.org/podIPs: 240.16.1.25/32
sidecar.opentelemetry.io/inject: true
Status: Running
IP: 240.16.1.25
IPs:
IP: 240.16.1.25
Controlled By: ReplicaSet/core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565
Containers:
api:
Container ID: containerd://f62d01a81b6a6256b05ea35771037719427565e5ea5e687be8c6fdba9f5db065
Image: docker.apple.com/avijit_sarkar/core-corpdirectory-service:latest1
Image ID: docker.apple.com/avijit_sarkar/core-corpdirectory-service@sha256:e8f229aac2e7f0b93b929823d967cb561067acc40eeb133b62fd7c1b1d2931db
Port: <none>
Host Port: <none>
Args:
serve
State: Running
Started: Mon, 10 Jul 2023 12:35:40 -0500
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 128Mi
Requests:
cpu: 10m
memory: 32Mi
Environment Variables from:
core-service-corpdirectory-1-cloud-provider-config ConfigMap Optional: false
Environment:
ENABLE_REFLECTION: true
APPLE_SYSTEM_ACCOUNT_NAME: <set to the key 'SYSTEM_ACCOUNT_NAME' in secret 'core-service-corpdirectory-1-corp-service-secret'> Optional: false
APPLE_SYSTEM_ACCOUNT_PASSWORD: <set to the key 'SYSTEM_ACCOUNT_PASSWORD' in secret 'core-service-corpdirectory-1-corp-service-secret'> Optional: false
APPLE_SYSTEM_ACCOUNT_ID: <set to the key 'SYSTEM_ACCOUNT_ID' in secret 'core-service-corpdirectory-1-corp-service-secret'> Optional: false
APPLE_SYSTEM_ACCOUNT_TOTP_SECRET: <set to the key 'SYSTEM_ACCOUNT_TOTP_SECRET' in secret 'core-service-corpdirectory-1-corp-service-secret'> Optional: false
APP_PASSWORD: <set to the key 'APP_PASSWORD' in secret 'core-service-corpdirectory-1-corp-service-secret'> Optional: false
APP_ID_KEY: <set to the key 'APP_ID_KEY' in secret 'core-service-corpdirectory-1-corp-service-secret'> Optional: false
LISTEN_ADDR: :50051
KUBERNETES_CLUSTER_DOMAIN: cluster.local
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wn424 (ro)
otc-container:
Container ID: containerd://13695dd137975097c47cc06ae6cdda2df4bf6e22da0d8b23a2585aa74ba3207d
Image: docker-upstream.apple.com/otel/opentelemetry-collector-contrib
Image ID: docker-upstream.apple.com/otel/opentelemetry-collector-contrib@sha256:c6671841470b83007e0553cdadbc9d05f6cfe17b3ebe9733728dc4a579a5b532
Ports: 8888/TCP, 4317/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--config=env:OTEL_CONFIG
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: StartError
Message: failed to start containerd task "13695dd137975097c47cc06ae6cdda2df4bf6e22da0d8b23a2585aa74ba3207d": context deadline exceeded: unknown
Exit Code: 128
Started: Wed, 31 Dec 1969 18:00:00 -0600
Finished: Mon, 10 Jul 2023 13:44:29 -0500
Ready: False
Restart Count: 33
Limits:
cpu: 10m
memory: 256Mi
Requests:
cpu: 10m
memory: 256Mi
Environment:
POD_NAME: core-service-corpdirectory-1-core-service-corpdirectory-762xcrw (v1:metadata.name)
OTEL_CONFIG: receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
k8s_cluster:
collection_interval: 10s
processors:
exporters:
logging:
verbosity: Detailed
service:
telemetry:
logs:
level: "debug"
pipelines:
metrics:
receivers: [otlp, k8s_cluster]
processors: []
exporters: [logging]
OTEL_RESOURCE_ATTRIBUTES_POD_NAME: core-service-corpdirectory-1-core-service-corpdirectory-762xcrw (v1:metadata.name)
OTEL_RESOURCE_ATTRIBUTES_POD_UID: (v1:metadata.uid)
OTEL_RESOURCE_ATTRIBUTES_NODE_NAME: (v1:spec.nodeName)
OTEL_RESOURCE_ATTRIBUTES: k8s.deployment.name=core-service-corpdirectory-1-core-service-corpdirectory,k8s.deployment.uid=ab9c8587-4444-4a3a-815c-15ec7ad66b18,k8s.namespace.name=core-service-corpdirectory-1,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.pod.uid=$(OTEL_RESOURCE_ATTRIBUTES_POD_UID),k8s.replicaset.name=core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565,k8s.replicaset.uid=ea7a9a5d-d55d-4759-b197-5770ee063946
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wn424 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-wn424:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 60m (x6 over 70m) kubelet Pulling image "docker-upstream.apple.com/otel/opentelemetry-collector-contrib"
Normal Created 60m (x6 over 70m) kubelet Created container otc-container
Normal Pulled 60m kubelet Successfully pulled image "docker-upstream.apple.com/otel/opentelemetry-collector-contrib" in 187.514999ms
Warning Failed 4m16s (x33 over 68m) kubelet Error: context deadline exceeded
@iblancasa - thanks for all your support with the troubleshooting, I am able to figure out the issue. It was caused by the resource limits I had for the otel-collector
manifest, I had these and after I removed those the otc sidecar container came up fine..
limits:
cpu: 10m
memory: 256Mi
requests:
cpu: 10m
memory: 256Mi
Hi,
I have installed the
opentelemetry-operator-0.32.0
in a GKE cluster using helm chart (https://open-telemetry.github.io/opentelemetry-helm-charts) and added thesidecar
container to my app pod, the otelotc-container
is kept on restarting.Error in the
opentelemetry-operator
pod log for containermanager
is belowThe otel-collector CRD:
App deployment manifest (with the sidecar)