open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.2k stars 436 forks source link

opentelemetry-operator sidecar injection failing #1898

Closed avijitsarkar123 closed 1 year ago

avijitsarkar123 commented 1 year ago

Hi,

I have installed the opentelemetry-operator-0.32.0 in a GKE cluster using helm chart (https://open-telemetry.github.io/opentelemetry-helm-charts) and added the sidecar container to my app pod, the otel otc-container is kept on restarting.

Error in the opentelemetry-operator pod log for container manager is below

{"error":"service property in the configuration doesn't contain extensions", "level":"error", "msg":"Cannot create liveness probe.", "stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.Container
            /workspace/pkg/collector/container.go:127
        github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add
            /workspace/pkg/sidecar/pod.go:43
        github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate
            /workspace/pkg/sidecar/podmutator.go:100
        github.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle
            /workspace/internal/webhookhandler/webhookhandler.go:92
        sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle
            /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/webhook.go:169
        sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP
            /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/http.go:98
        github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1
            /go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:60
        net/http.HandlerFunc.ServeHTTP
            /usr/local/go/src/net/http/server.go:2122
        github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1
            /go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:147
        net/http.HandlerFunc.ServeHTTP
            /usr/local/go/src/net/http/server.go:2122
        github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2
            /go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:109
        net/http.HandlerFunc.ServeHTTP
            /usr/local/go/src/net/http/server.go:2122
        net/http.(*ServeMux).ServeHTTP
            /usr/local/go/src/net/http/server.go:2500
        net/http.serverHandler.ServeHTTP
            /usr/local/go/src/net/http/server.go:2936
        net/http.(*conn).serve
            /usr/local/go/src/net/http/server.go:1995", "ts":"2023-07-07T04:14:38Z"}

The otel-collector CRD:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
annotations:
  helm.sh/hook: pre-install
creationTimestamp: '2023-06-30T22:35:25Z'
generation: 1
labels:
  app.kubernetes.io/managed-by: opentelemetry-operator
managedFields:
  - apiVersion: opentelemetry.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:version: {}
    manager: Go-http-client
    operation: Update
    subresource: status
    time: '2023-06-30T22:35:25Z'
  - apiVersion: opentelemetry.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:helm.sh/hook: {}
      f:spec:
        .: {}
        f:config: {}
        f:image: {}
        f:mode: {}
        f:resources:
          .: {}
          f:limits:
            .: {}
            f:cpu: {}
            f:memory: {}
          f:requests:
            .: {}
            f:cpu: {}
            f:memory: {}
    manager: helm
    operation: Update
    time: '2023-06-30T22:35:25Z'
name: otel-collector
namespace: core-service-corpdirectory-1
resourceVersion: '38237450'
uid: 3d33a1a5-2f70-4095-ab1f-72f8b496945f
selfLink: >-
  /apis/opentelemetry.io/v1alpha1/namespaces/core-service-corpdirectory-1/opentelemetrycollectors/otel-collector
status:
version: 0.79.0
spec:
config: |
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
  processors:

  exporters:
    logging:
      verbosity: Detailed

  service:
    telemetry:
      logs:
        level: "debug"
    pipelines:
      metrics:
        receivers: [otlp]
        processors: []
        exporters: [logging]
image: otel/opentelemetry-collector-contrib
ingress:
  route: {}
mode: sidecar
replicas: 1
resources:
  limits:
    cpu: 10m
    memory: 256Mi
  requests:
    cpu: 10m
    memory: 256Mi
targetAllocator:
  prometheusCR: {}
upgradeStrategy: automatic

App deployment manifest (with the sidecar)

apiVersion: v1
kind: Pod
metadata:
  name: core-service-corpdirectory-1-core-service-corpdirectory-76ptw5b
  generateName: core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565-
  namespace: core-service-corpdirectory-1
  uid: 5ddb1137-19c2-4c1f-b0e3-91bd53d96ca5
  resourceVersion: '44134635'
  creationTimestamp: '2023-07-07T04:14:39Z'
  labels:
    app: core-service-corpdirectory
    app.kubernetes.io/instance: core-service-corpdirectory-1
    app.kubernetes.io/name: core-service-corpdirectory-1
    pod-template-hash: 76fbccc565
    sidecar.opentelemetry.io/injected: core-service-corpdirectory-1.otel-collector
  annotations:
    cni.projectcalico.org/containerID: 65912066f8437fd1715beab5dec516249edd8d43e0ed6c1b10a0700345c05d56
    cni.projectcalico.org/podIP: 240.16.3.12/32
    cni.projectcalico.org/podIPs: 240.16.3.12/32
    sidecar.opentelemetry.io/inject: 'true'
  ownerReferences:
    - apiVersion: apps/v1
      kind: ReplicaSet
      name: core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565
      uid: 1c189122-edd0-4fb3-a6dc-f5a8a8033a6c
      controller: true
      blockOwnerDeletion: true
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2023-07-07T04:14:39Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:sidecar.opentelemetry.io/inject: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:app: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/name: {}
            f:pod-template-hash: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"1c189122-edd0-4fb3-a6dc-f5a8a8033a6c"}: {}
        f:spec:
          f:containers:
            k:{"name":"api"}:
              .: {}
              f:args: {}
              f:env:
                .: {}
                k:{"name":"APP_ID_KEY"}:
                  .: {}
                  f:name: {}
                  f:valueFrom:
                    .: {}
                    f:secretKeyRef: {}
                k:{"name":"APP_PASSWORD"}:
                  .: {}
                  f:name: {}
                  f:valueFrom:
                    .: {}
                    f:secretKeyRef: {}
                k:{"name":"ENABLE_REFLECTION"}:
                  .: {}
                  f:name: {}
                  f:value: {}
                k:{"name":"KUBERNETES_CLUSTER_DOMAIN"}:
                  .: {}
                  f:name: {}
                  f:value: {}
                k:{"name":"LISTEN_ADDR"}:
                  .: {}
                  f:name: {}
                  f:value: {}
              f:envFrom: {}
              f:image: {}
              f:imagePullPolicy: {}
              f:name: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:terminationMessagePath: {}
              f:terminationMessagePolicy: {}
          f:dnsPolicy: {}
          f:enableServiceLinks: {}
          f:restartPolicy: {}
          f:schedulerName: {}
          f:securityContext: {}
          f:serviceAccount: {}
          f:serviceAccountName: {}
          f:terminationGracePeriodSeconds: {}
    - manager: Go-http-client
      operation: Update
      apiVersion: v1
      time: '2023-07-07T04:14:40Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:cni.projectcalico.org/containerID: {}
            f:cni.projectcalico.org/podIP: {}
            f:cni.projectcalico.org/podIPs: {}
      subresource: status
    - manager: kubelet
      operation: Update
      apiVersion: v1
      time: '2023-07-07T04:31:44Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions:
            k:{"type":"ContainersReady"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Initialized"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:status: {}
              f:type: {}
            k:{"type":"Ready"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:containerStatuses: {}
          f:hostIP: {}
          f:phase: {}
          f:podIP: {}
          f:podIPs:
            .: {}
            k:{"ip":"240.16.3.12"}:
              .: {}
              f:ip: {}
          f:startTime: {}
      subresource: status
  selfLink: >-
    /api/v1/namespaces/core-service-corpdirectory-1/pods/core-service-corpdirectory-1-core-service-corpdirectory-76ptw5b
status:
  phase: Running
  conditions:
    - type: Initialized
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2023-07-07T04:14:40Z'
    - type: Ready
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2023-07-07T04:14:40Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [otc-container]'
    - type: ContainersReady
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2023-07-07T04:14:40Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [otc-container]'
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2023-07-07T04:14:39Z'
  hostIP: 172.17.80.28
  podIP: 240.16.3.12
  podIPs:
    - ip: 240.16.3.12
  startTime: '2023-07-07T04:14:40Z'
  containerStatuses:
    - name: api
      state:
        running:
          startedAt: '2023-07-07T04:14:46Z'
      lastState: {}
      ready: true
      restartCount: 0
      image: my_repo/core-corpdirectory-service:latest1
      imageID: >-
        my_repo/core-corpdirectory-service@sha256:e8f229aac2e7f0b93b929823d967cb561067acc40eeb133b62fd7c1b1d2931db
      containerID: >-
        containerd://a2eb827fe94575448259712acd0b8b08611961859456a41b80e3e9428b62fabd
      started: true
    - name: otc-container
      state:
        waiting:
          reason: CrashLoopBackOff
          message: >-
            back-off 2m40s restarting failed container=otc-container
            pod=core-service-corpdirectory-1-core-service-corpdirectory-76ptw5b_core-service-corpdirectory-1(5ddb1137-19c2-4c1f-b0e3-91bd53d96ca5)
      lastState:
        terminated:
          exitCode: 128
          reason: StartError
          message: >-
            failed to start containerd task
            "fa54a450d39d2ac489c017dcf7ccb4b9b6ca40c7ecb919faa8c4d62930dbacdf":
            context canceled: unknown
          startedAt: '1970-01-01T00:00:00Z'
          finishedAt: '2023-07-07T04:31:32Z'
          containerID: >-
            containerd://fa54a450d39d2ac489c017dcf7ccb4b9b6ca40c7ecb919faa8c4d62930dbacdf
      ready: false
      restartCount: 6
      image: otel/opentelemetry-collector-contrib:latest
      imageID: >-
        otel/opentelemetry-collector-contrib@sha256:c6671841470b83007e0553cdadbc9d05f6cfe17b3ebe9733728dc4a579a5b532
      containerID: >-
        containerd://fa54a450d39d2ac489c017dcf7ccb4b9b6ca40c7ecb919faa8c4d62930dbacdf
      started: false
  qosClass: Burstable
spec:
  volumes:
    - name: kube-api-access-g6mzr
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: api
      image: my_repo/core-corpdirectory-service:latest1
      args:
        - serve
      envFrom:
        - configMapRef:
            name: core-service-corpdirectory-1-cloud-provider-config
      env:
        - name: ENABLE_REFLECTION
          value: 'true'
        - name: APP_PASSWORD
          valueFrom:
            secretKeyRef:
              name: core-service-corpdirectory-1-corp-service-secret
              key: APP_PASSWORD
        - name: APP_ID_KEY
          valueFrom:
            secretKeyRef:
              name: core-service-corpdirectory-1-corp-service-secret
              key: APP_ID_KEY
        - name: LISTEN_ADDR
          value: ':50051'
        - name: KUBERNETES_CLUSTER_DOMAIN
          value: cluster.local
      resources:
        limits:
          cpu: 100m
          memory: 128Mi
        requests:
          cpu: 10m
          memory: 32Mi
      volumeMounts:
        - name: kube-api-access-g6mzr
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: Always
    - name: otc-container
      image: otel/opentelemetry-collector-contrib
      args:
        - '--config=env:OTEL_CONFIG'
      ports:
        - name: metrics
          containerPort: 8888
          protocol: TCP
        - name: otlp-grpc
          containerPort: 4317
          protocol: TCP
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: OTEL_CONFIG
          value: |
            receivers:
              otlp:
                protocols:
                  grpc:
                    endpoint: 0.0.0.0:4317
            processors:
            exporters:
              logging:
                verbosity: Detailed

            service:
              telemetry:
                logs:
                  level: "debug"
              pipelines:
                metrics:
                  receivers: [otlp]
                  processors: []
                  exporters: [logging]
        - name: OTEL_RESOURCE_ATTRIBUTES_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: OTEL_RESOURCE_ATTRIBUTES_POD_UID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.uid
        - name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: >-
            k8s.deployment.name=core-service-corpdirectory-1-core-service-corpdirectory,k8s.deployment.uid=53ae1ffc-6ff2-497c-881e-07b37395d521,k8s.namespace.name=core-service-corpdirectory-1,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.pod.uid=$(OTEL_RESOURCE_ATTRIBUTES_POD_UID),k8s.replicaset.name=core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565,k8s.replicaset.uid=1c189122-edd0-4fb3-a6dc-f5a8a8033a6c
      resources:
        limits:
          cpu: 10m
          memory: 256Mi
        requests:
          cpu: 10m
          memory: 256Mi
      volumeMounts:
        - name: kube-api-access-g6mzr
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: Always
  restartPolicy: Always
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: core-service-corpdirectory-sa
  serviceAccount: core-service-corpdirectory-sa
  nodeName: gke-stg-gcp-us-west1-stg-gcp-uswe1-np-7cbcf840-cb4t
  securityContext: {}
  schedulerName: default-scheduler
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
  priority: 0
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority
iblancasa commented 1 year ago

Hi @avijitsarkar123 could you try with a newer version?

avijitsarkar123 commented 1 year ago

@iblancasa - I upgraded my opentelemetry-operator to the latest version but still getting the same error.. the versions are now: "opentelemetry-operator":"0.80.0","opentelemetry-collector":"otel/opentelemetry-collector-contrib:0.80.0"

Am I missing something in my otel-collector yaml that causing this error?

"msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions",
{"level":"info","ts":"2023-07-10T14:47:11Z","msg":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.80.0","opentelemetry-collector":"otel/opentelemetry-collector-contrib:0.80.0","opentelemetry-targetallocator":"ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.80.0","operator-opamp-bridge":"ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:0.80.0","auto-instrumentation-java":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.26.0","auto-instrumentation-nodejs":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.40.0","auto-instrumentation-python":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.39b0","auto-instrumentation-dotnet":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.7.0","auto-instrumentation-go":"ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.2.1-alpha","auto-instrumentation-apache-httpd":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3","feature-gates":"operator.autoinstrumentation.apache-httpd,operator.autoinstrumentation.dotnet,-operator.autoinstrumentation.go,operator.autoinstrumentation.java,operator.autoinstrumentation.nodejs,operator.autoinstrumentation.python,-operator.collector.rewritetargetallocator","build-date":"2023-06-28T17:26:24Z","go-version":"go1.20.5","go-arch":"amd64","go-os":"linux","labels-filter":[]}
{"level":"info","ts":"2023-07-10T14:47:11Z","logger":"setup","msg":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
{"level":"info","ts":"2023-07-10T14:47:11Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":"0.0.0.0:8080"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2023-07-10T14:47:12Z","msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":"2023-07-10T14:47:12Z","msg":"starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
I0710 14:47:12.150869       1 leaderelection.go:245] attempting to acquire leader lease kaizen-system/9f7554c3.opentelemetry.io...
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":"2023-07-10T14:47:12Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
{"level":"info","ts":"2023-07-10T14:47:43Z","msg":"couldn't determine metrics port from configuration, using 8888 default value","error":"missing port in address"}
{"level":"error","ts":"2023-07-10T14:47:43Z","msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.Container\n\t/workspace/pkg/collector/container.go:127\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add\n\t/workspace/pkg/sidecar/pod.go:43\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/workspace/pkg/sidecar/podmutator.go:100\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle\n\t/workspace/internal/webhookhandler/webhookhandler.go:92\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/http.go:98\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2500\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}
iblancasa commented 1 year ago

"msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions",

This just informs you your OTEL Config doesn't contain any extensions. Should not be a problem for injecting the sidecar.

After reading your configuration again... is the log message the only problem you see or there is something else like the instrumentation is not working or something?

avijitsarkar123 commented 1 year ago

So the sidecar container is never coming up and showing the error "context deadline exceeded", I tried checking the kubelet log of GKE to see if it has any additional details.. there also I am just having below...

"MESSAGE": "E0710 15:13:20.346185    1872 pod_workers.go:965] \"Error syncing pod, skipping\" err=\"failed to \\\"StartContainer\\\" for \\\"otc-container\\\" with RunContainerError: \\\"context deadline exceeded\\\"\" pod=\"core-service-corpdirectory-1/core-service-corpdirectory-1-core-service-corpdirectory-76bwbct\" podUID=403fd05c-5ec1-4267-8361-0b2d7de46302"

Is there a way to have more verbose logging for the operator to get any additional details why the sidecar isn't coming up?

This same setup works fine in local Kind cluster but on GKE its failing...

iblancasa commented 1 year ago

Try enabling these options:

      --zap-log-level level                              Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
      --zap-stacktrace-level level                       Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic')
avijitsarkar123 commented 1 year ago

So after enabling logging as mentioned above, I do see an error (the last one)

{"level":"info","ts":"2023-07-10T16:35:30Z","msg":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1,"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:219\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:233\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/manager/runnable_group.go:219"}

{"level":"debug","ts":"2023-07-10T16:35:43Z","msg":"injecting sidecar into pod","namespace":"core-service-corpdirectory-1","name":"","otelcol-namespace":"core-service-corpdirectory-1","otelcol-name":"otel-collector"}

{"level":"info","ts":"2023-07-10T16:35:43Z","msg":"couldn't determine metrics port from configuration, using 8888 default value","error":"missing port in address","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.getConfigContainerPorts\n\t/workspace/pkg/collector/container.go:181\ngithub.com/open-telemetry/opentelemetry-operator/pkg/collector.Container\n\t/workspace/pkg/collector/container.go:46\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add\n\t/workspace/pkg/sidecar/pod.go:43\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/workspace/pkg/sidecar/podmutator.go:100\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle\n\t/workspace/internal/webhookhandler/webhookhandler.go:92\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/http.go:98\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2500\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}

{"level":"error","ts":"2023-07-10T16:35:43Z","msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.Container\n\t/workspace/pkg/collector/container.go:127\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add\n\t/workspace/pkg/sidecar/pod.go:43\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/workspace/pkg/sidecar/podmutator.go:100\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle\n\t/workspace/internal/webhookhandler/webhookhandler.go:92\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/webhook/admission/http.go:98\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2500\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}

{"level":"debug","ts":"2023-07-10T16:35:43Z","msg":"annotation not present in deployment, skipping instrumentation injection","namespace":"core-service-corpdirectory-1","name":""}

My generated pod spec is below

apiVersion: v1
kind: Pod
metadata:
  name: core-service-corpdirectory-1-core-service-corpdirectory-76xrcsk
  generateName: core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565-
  namespace: core-service-corpdirectory-1
  uid: 4e6ba87c-0d59-4400-aea2-80feaf5d7b5c
  resourceVersion: '2779437'
  creationTimestamp: '2023-07-10T16:35:43Z'
  labels:
    app: core-service-corpdirectory
    app.kubernetes.io/instance: core-service-corpdirectory-1
    app.kubernetes.io/name: core-service-corpdirectory-1
    pod-template-hash: 76fbccc565
    sidecar.opentelemetry.io/injected: core-service-corpdirectory-1.otel-collector
  annotations:
    cni.projectcalico.org/containerID: adec446a30049b6b3577057db9eb5228ba51ff71bfc9bf86ff8bcda07c45e330
    cni.projectcalico.org/podIP: 240.16.0.25/32
    cni.projectcalico.org/podIPs: 240.16.0.25/32
    sidecar.opentelemetry.io/inject: 'true'
  ownerReferences:
    - apiVersion: apps/v1
      kind: ReplicaSet
      name: core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565
      uid: 5fed46ef-5fc2-4408-809d-1f86d9e8963f
      controller: true
      blockOwnerDeletion: true
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2023-07-10T16:35:43Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:sidecar.opentelemetry.io/inject: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:app: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/name: {}
            f:pod-template-hash: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"5fed46ef-5fc2-4408-809d-1f86d9e8963f"}: {}
        f:spec:
          f:containers:
            k:{"name":"api"}:
              .: {}
              f:args: {}
              f:env:
                .: {}
                k:{"name":"APPLE_SYSTEM_ACCOUNT_ID"}:
                  .: {}
                  f:name: {}
                  f:valueFrom:
                    .: {}
                    f:secretKeyRef: {}
                k:{"name":"APPLE_SYSTEM_ACCOUNT_NAME"}:
                  .: {}
                  f:name: {}
                  f:valueFrom:
                    .: {}
                    f:secretKeyRef: {}
                k:{"name":"APPLE_SYSTEM_ACCOUNT_PASSWORD"}:
                  .: {}
                  f:name: {}
                  f:valueFrom:
                    .: {}
                    f:secretKeyRef: {}
                k:{"name":"APPLE_SYSTEM_ACCOUNT_TOTP_SECRET"}:
                  .: {}
                  f:name: {}
                  f:valueFrom:
                    .: {}
                    f:secretKeyRef: {}
                k:{"name":"APP_ID_KEY"}:
                  .: {}
                  f:name: {}
                  f:valueFrom:
                    .: {}
                    f:secretKeyRef: {}
                k:{"name":"APP_PASSWORD"}:
                  .: {}
                  f:name: {}
                  f:valueFrom:
                    .: {}
                    f:secretKeyRef: {}
                k:{"name":"ENABLE_REFLECTION"}:
                  .: {}
                  f:name: {}
                  f:value: {}
                k:{"name":"KUBERNETES_CLUSTER_DOMAIN"}:
                  .: {}
                  f:name: {}
                  f:value: {}
                k:{"name":"LISTEN_ADDR"}:
                  .: {}
                  f:name: {}
                  f:value: {}
              f:envFrom: {}
              f:image: {}
              f:imagePullPolicy: {}
              f:name: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:terminationMessagePath: {}
              f:terminationMessagePolicy: {}
          f:dnsPolicy: {}
          f:enableServiceLinks: {}
          f:restartPolicy: {}
          f:schedulerName: {}
          f:securityContext: {}
          f:serviceAccount: {}
          f:serviceAccountName: {}
          f:terminationGracePeriodSeconds: {}
    - manager: calico
      operation: Update
      apiVersion: v1
      time: '2023-07-10T16:35:44Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:cni.projectcalico.org/containerID: {}
            f:cni.projectcalico.org/podIP: {}
            f:cni.projectcalico.org/podIPs: {}
      subresource: status
    - manager: kubelet
      operation: Update
      apiVersion: v1
      time: '2023-07-10T16:37:50Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions:
            k:{"type":"ContainersReady"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Initialized"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:status: {}
              f:type: {}
            k:{"type":"Ready"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:containerStatuses: {}
          f:hostIP: {}
          f:podIP: {}
          f:podIPs:
            .: {}
            k:{"ip":"240.16.0.25"}:
              .: {}
              f:ip: {}
          f:startTime: {}
      subresource: status
  selfLink: >-
    /api/v1/namespaces/core-service-corpdirectory-1/pods/core-service-corpdirectory-1-core-service-corpdirectory-76xrcsk
status:
  phase: Pending
  conditions:
    - type: Initialized
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2023-07-10T16:35:43Z'
    - type: Ready
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2023-07-10T16:35:43Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [otc-container]'
    - type: ContainersReady
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2023-07-10T16:35:43Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [otc-container]'
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2023-07-10T16:35:43Z'
  hostIP: 198.19.63.203
  podIP: 240.16.0.25
  podIPs:
    - ip: 240.16.0.25
  startTime: '2023-07-10T16:35:43Z'
  containerStatuses:
    - name: api
      state:
        running:
          startedAt: '2023-07-10T16:35:50Z'
      lastState: {}
      ready: true
      restartCount: 0
      image: docker.apple.com/avijit_sarkar/core-corpdirectory-service:latest1
      imageID: >-
        docker.apple.com/avijit_sarkar/core-corpdirectory-service@sha256:e8f229aac2e7f0b93b929823d967cb561067acc40eeb133b62fd7c1b1d2931db
      containerID: >-
        containerd://8ab321a3e22f904ad12de21846b53d23e677a34be7084c0fbc72e1146ca80f99
      started: true
    - name: otc-container
      state:
        waiting:
          reason: RunContainerError
          message: context deadline exceeded
      lastState: {}
      ready: false
      restartCount: 1
      image: docker-upstream.apple.com/otel/opentelemetry-collector-contrib:latest
      imageID: >-
        docker-upstream.apple.com/otel/opentelemetry-collector-contrib@sha256:c6671841470b83007e0553cdadbc9d05f6cfe17b3ebe9733728dc4a579a5b532
      containerID: >-
        containerd://e162f6f5199f68b81821da1de7e329b3e464aaf6396ad3abae20f00b8b3787c4
      started: false
  qosClass: Burstable
spec:
  volumes:
    - name: kube-api-access-wwqc5
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: api
      image: docker.apple.com/avijit_sarkar/core-corpdirectory-service:latest1
      args:
        - serve
      envFrom:
        - configMapRef:
            name: core-service-corpdirectory-1-cloud-provider-config
      env:
        - name: ENABLE_REFLECTION
          value: 'true'
        - name: APPLE_SYSTEM_ACCOUNT_NAME
          valueFrom:
            secretKeyRef:
              name: core-service-corpdirectory-1-corp-service-secret
              key: APPLE_SYSTEM_ACCOUNT_NAME
        - name: APPLE_SYSTEM_ACCOUNT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: core-service-corpdirectory-1-corp-service-secret
              key: APPLE_SYSTEM_ACCOUNT_PASSWORD
        - name: APPLE_SYSTEM_ACCOUNT_ID
          valueFrom:
            secretKeyRef:
              name: core-service-corpdirectory-1-corp-service-secret
              key: APPLE_SYSTEM_ACCOUNT_ID
        - name: APPLE_SYSTEM_ACCOUNT_TOTP_SECRET
          valueFrom:
            secretKeyRef:
              name: core-service-corpdirectory-1-corp-service-secret
              key: APPLE_SYSTEM_ACCOUNT_TOTP_SECRET
        - name: APP_PASSWORD
          valueFrom:
            secretKeyRef:
              name: core-service-corpdirectory-1-corp-service-secret
              key: APP_PASSWORD
        - name: APP_ID_KEY
          valueFrom:
            secretKeyRef:
              name: core-service-corpdirectory-1-corp-service-secret
              key: APP_ID_KEY
        - name: LISTEN_ADDR
          value: ':50051'
        - name: KUBERNETES_CLUSTER_DOMAIN
          value: cluster.local
      resources:
        limits:
          cpu: 100m
          memory: 128Mi
        requests:
          cpu: 10m
          memory: 32Mi
      volumeMounts:
        - name: kube-api-access-wwqc5
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: Always
    - name: otc-container
      image: docker-upstream.apple.com/otel/opentelemetry-collector-contrib
      args:
        - '--config=env:OTEL_CONFIG'
      ports:
        - name: metrics
          containerPort: 8888
          protocol: TCP
        - name: otlp-grpc
          containerPort: 4317
          protocol: TCP
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: OTEL_CONFIG
          value: |
            receivers:
              otlp:
                protocols:
                  grpc:
                    endpoint: 0.0.0.0:4317
              k8s_cluster:
                collection_interval: 10s
            processors:
              # groupbyattrs:
              #   keys:
              #     - namespace
              #     - cluster
              #     - location
              # batch:
              #   # batch metrics before sending to reduce API usage
              #   send_batch_max_size: 100
              #   send_batch_size: 100
              #   timeout: 5s
              # memory_limiter:
              #   # drop metrics if memory usage gets too high
              #   check_interval: 1s
              #   limit_percentage: 85
              #   spike_limit_percentage: 20
              # resourcedetection:
              #   # detect cluster name and location
              #   detectors: [gcp]
              #   timeout: 2s
              #   override: false

            exporters:
              logging:
                verbosity: Detailed
              # googlemanagedprometheus:
              #   project: ct-gcp-sre-monitorin-dev-09qb

            service:
              telemetry:
                logs:
                  level: "debug"
              pipelines:
                metrics:
                  receivers: [otlp, k8s_cluster]
                  # processors: [resourcedetection, groupbyattrs, batch, memory_limiter]
                  processors: []
                  exporters: [logging]
        - name: OTEL_RESOURCE_ATTRIBUTES_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: OTEL_RESOURCE_ATTRIBUTES_POD_UID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.uid
        - name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: >-
            k8s.deployment.name=core-service-corpdirectory-1-core-service-corpdirectory,k8s.deployment.uid=53e2ea8e-44cd-4265-aba2-11edbe830e45,k8s.namespace.name=core-service-corpdirectory-1,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.pod.uid=$(OTEL_RESOURCE_ATTRIBUTES_POD_UID),k8s.replicaset.name=core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565,k8s.replicaset.uid=5fed46ef-5fc2-4408-809d-1f86d9e8963f
      resources:
        limits:
          cpu: 10m
          memory: 256Mi
        requests:
          cpu: 10m
          memory: 256Mi
      volumeMounts:
        - name: kube-api-access-wwqc5
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: Always
  restartPolicy: Always
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: core-service-corpdirectory-sa
  serviceAccount: core-service-corpdirectory-sa
  nodeName: gke-test-gcp-us-west-test-gcp-uswe1-n-7a8f0692-79o7
  securityContext: {}
  schedulerName: default-scheduler
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
  priority: 0
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority

My deployment.yaml (helm template) is below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "core-service-corpdirectory-1.fullname" . }}-core-service-corpdirectory
  labels:
    app: core-service-corpdirectory
  {{- include "core-service-corpdirectory-1.labels" . | nindent 4 }}
spec:
  selector:
    matchLabels:
      app: core-service-corpdirectory
    {{- include "core-service-corpdirectory-1.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        app: core-service-corpdirectory
      {{- include "core-service-corpdirectory-1.selectorLabels" . | nindent 8 }}
      annotations:
        sidecar.opentelemetry.io/inject: "true"
    spec:
      serviceAccountName: core-service-corpdirectory-sa
      containers:
      - args: {{- toYaml .Values.coreServiceCorpdirectory.api.args | nindent 8 }}
        env:
        - name: ENABLE_REFLECTION
          value: {{ quote .Values.coreServiceCorpdirectory.api.env.enableReflection }}
        - name: APPLE_SYSTEM_ACCOUNT_NAME
          valueFrom:
            secretKeyRef:
              key: APPLE_SYSTEM_ACCOUNT_NAME
              name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
        - name: APPLE_SYSTEM_ACCOUNT_PASSWORD
          valueFrom:
            secretKeyRef:
              key: APPLE_SYSTEM_ACCOUNT_PASSWORD
              name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
        - name: APPLE_SYSTEM_ACCOUNT_ID
          valueFrom:
            secretKeyRef:
              key: APPLE_SYSTEM_ACCOUNT_ID
              name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
        - name: APPLE_SYSTEM_ACCOUNT_TOTP_SECRET
          valueFrom:
            secretKeyRef:
              key: APPLE_SYSTEM_ACCOUNT_TOTP_SECRET
              name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
        - name: APP_PASSWORD
          valueFrom:
            secretKeyRef:
              key: APP_PASSWORD
              name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
        - name: APP_ID_KEY
          valueFrom:
            secretKeyRef:
              key: APP_ID_KEY
              name: {{ include "core-service-corpdirectory-1.fullname" . }}-corp-service-secret
        - name: LISTEN_ADDR
          value: {{ quote .Values.coreServiceCorpdirectory.api.env.listenAddr }}
        - name: KUBERNETES_CLUSTER_DOMAIN
          value: {{ quote .Values.kubernetesClusterDomain }}
        envFrom:
        - configMapRef:
            name: {{ include "core-service-corpdirectory-1.fullname" . }}-cloud-provider-config
        image: {{ .Values.coreServiceCorpdirectory.api.image.repository }}:{{ .Values.coreServiceCorpdirectory.api.image.tag
          | default .Chart.AppVersion }}
        imagePullPolicy: {{ .Values.coreServiceCorpdirectory.api.imagePullPolicy }}
        name: api
        resources: {{- toYaml .Values.coreServiceCorpdirectory.api.resources | nindent 10 }}
avijitsarkar123 commented 1 year ago

@iblancasa - based on above log it seems the sidecar injection is working as expected, the injected container (image: otel/opentelemetry-collector-contrib) just not coming up, can we enable debug tracing for that container?

iblancasa commented 1 year ago

Did you check the output for describe?

avijitsarkar123 commented 1 year ago

yes nothing much there..

>>> kc describe pods core-service-corpdirectory-1-core-service-corpdirectory-762xcrw
+ kubectl describe pods core-service-corpdirectory-1-core-service-corpdirectory-762xcrw
Name:         core-service-corpdirectory-1-core-service-corpdirectory-762xcrw
Namespace:    core-service-corpdirectory-1
Priority:     0
Node:         gke-test-gcp-us-west-test-gcp-uswe1-n-bf30fd8f-90se/198.19.63.202
Start Time:   Mon, 10 Jul 2023 12:35:32 -0500
Labels:       app=core-service-corpdirectory
              app.kubernetes.io/instance=core-service-corpdirectory-1
              app.kubernetes.io/name=core-service-corpdirectory-1
              pod-template-hash=76fbccc565
              sidecar.opentelemetry.io/injected=core-service-corpdirectory-1.otel-collector
Annotations:  cni.projectcalico.org/containerID: f5ad5fb9c7544342aa09bd8a4a704dfc0ec11acf3910ebaa00afabb96f75ec18
              cni.projectcalico.org/podIP: 240.16.1.25/32
              cni.projectcalico.org/podIPs: 240.16.1.25/32
              sidecar.opentelemetry.io/inject: true
Status:       Running
IP:           240.16.1.25
IPs:
  IP:           240.16.1.25
Controlled By:  ReplicaSet/core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565
Containers:
  api:
    Container ID:  containerd://f62d01a81b6a6256b05ea35771037719427565e5ea5e687be8c6fdba9f5db065
    Image:         docker.apple.com/avijit_sarkar/core-corpdirectory-service:latest1
    Image ID:      docker.apple.com/avijit_sarkar/core-corpdirectory-service@sha256:e8f229aac2e7f0b93b929823d967cb561067acc40eeb133b62fd7c1b1d2931db
    Port:          <none>
    Host Port:     <none>
    Args:
      serve
    State:          Running
      Started:      Mon, 10 Jul 2023 12:35:40 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     10m
      memory:  32Mi
    Environment Variables from:
      core-service-corpdirectory-1-cloud-provider-config  ConfigMap  Optional: false
    Environment:
      ENABLE_REFLECTION:                 true
      APPLE_SYSTEM_ACCOUNT_NAME:         <set to the key 'SYSTEM_ACCOUNT_NAME' in secret 'core-service-corpdirectory-1-corp-service-secret'>         Optional: false
      APPLE_SYSTEM_ACCOUNT_PASSWORD:     <set to the key 'SYSTEM_ACCOUNT_PASSWORD' in secret 'core-service-corpdirectory-1-corp-service-secret'>     Optional: false
      APPLE_SYSTEM_ACCOUNT_ID:           <set to the key 'SYSTEM_ACCOUNT_ID' in secret 'core-service-corpdirectory-1-corp-service-secret'>           Optional: false
      APPLE_SYSTEM_ACCOUNT_TOTP_SECRET:  <set to the key 'SYSTEM_ACCOUNT_TOTP_SECRET' in secret 'core-service-corpdirectory-1-corp-service-secret'>  Optional: false
      APP_PASSWORD:                      <set to the key 'APP_PASSWORD' in secret 'core-service-corpdirectory-1-corp-service-secret'>                      Optional: false
      APP_ID_KEY:                        <set to the key 'APP_ID_KEY' in secret 'core-service-corpdirectory-1-corp-service-secret'>                        Optional: false
      LISTEN_ADDR:                       :50051
      KUBERNETES_CLUSTER_DOMAIN:         cluster.local
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wn424 (ro)
  otc-container:
    Container ID:  containerd://13695dd137975097c47cc06ae6cdda2df4bf6e22da0d8b23a2585aa74ba3207d
    Image:         docker-upstream.apple.com/otel/opentelemetry-collector-contrib
    Image ID:      docker-upstream.apple.com/otel/opentelemetry-collector-contrib@sha256:c6671841470b83007e0553cdadbc9d05f6cfe17b3ebe9733728dc4a579a5b532
    Ports:         8888/TCP, 4317/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      --config=env:OTEL_CONFIG
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       StartError
      Message:      failed to start containerd task "13695dd137975097c47cc06ae6cdda2df4bf6e22da0d8b23a2585aa74ba3207d": context deadline exceeded: unknown
      Exit Code:    128
      Started:      Wed, 31 Dec 1969 18:00:00 -0600
      Finished:     Mon, 10 Jul 2023 13:44:29 -0500
    Ready:          False
    Restart Count:  33
    Limits:
      cpu:     10m
      memory:  256Mi
    Requests:
      cpu:     10m
      memory:  256Mi
    Environment:
      POD_NAME:                            core-service-corpdirectory-1-core-service-corpdirectory-762xcrw (v1:metadata.name)
      OTEL_CONFIG:                         receivers:
                                             otlp:
                                               protocols:
                                                 grpc:
                                                   endpoint: 0.0.0.0:4317
                                             k8s_cluster:
                                               collection_interval: 10s
                                           processors:

                                           exporters:
                                             logging:
                                               verbosity: Detailed

                                           service:
                                             telemetry:
                                               logs:
                                                 level: "debug"
                                             pipelines:
                                               metrics:
                                                 receivers: [otlp, k8s_cluster]
                                                 processors: []
                                                 exporters: [logging]

      OTEL_RESOURCE_ATTRIBUTES_POD_NAME:   core-service-corpdirectory-1-core-service-corpdirectory-762xcrw (v1:metadata.name)
      OTEL_RESOURCE_ATTRIBUTES_POD_UID:     (v1:metadata.uid)
      OTEL_RESOURCE_ATTRIBUTES_NODE_NAME:   (v1:spec.nodeName)
      OTEL_RESOURCE_ATTRIBUTES:            k8s.deployment.name=core-service-corpdirectory-1-core-service-corpdirectory,k8s.deployment.uid=ab9c8587-4444-4a3a-815c-15ec7ad66b18,k8s.namespace.name=core-service-corpdirectory-1,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.pod.uid=$(OTEL_RESOURCE_ATTRIBUTES_POD_UID),k8s.replicaset.name=core-service-corpdirectory-1-core-service-corpdirectory-76fbccc565,k8s.replicaset.uid=ea7a9a5d-d55d-4759-b197-5770ee063946
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wn424 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-wn424:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Normal   Pulling  60m (x6 over 70m)     kubelet  Pulling image "docker-upstream.apple.com/otel/opentelemetry-collector-contrib"
  Normal   Created  60m (x6 over 70m)     kubelet  Created container otc-container
  Normal   Pulled   60m                   kubelet  Successfully pulled image "docker-upstream.apple.com/otel/opentelemetry-collector-contrib" in 187.514999ms
  Warning  Failed   4m16s (x33 over 68m)  kubelet  Error: context deadline exceeded
avijitsarkar123 commented 1 year ago

@iblancasa - thanks for all your support with the troubleshooting, I am able to figure out the issue. It was caused by the resource limits I had for the otel-collector manifest, I had these and after I removed those the otc sidecar container came up fine..

    limits:
      cpu: 10m
      memory: 256Mi
    requests:
      cpu: 10m
      memory: 256Mi