open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.19k stars 430 forks source link

volume and volumemount resources config not passed to target allocator but podannotations are #3264

Open paebersold-tyro opened 1 month ago

paebersold-tyro commented 1 month ago

Component(s)

target allocator

What happened?

Description

Hello, this is half bug/half feature request.

We are using the otel-operator as a daemonset with the target allocator enabled (via the operator helm chart). We are also using istio (sidecar being auto injected). For the setting up of prometheus scraping endpoints inside the istio mesh we are following the istio guidelines (https://istio.io/latest/docs/ops/integrations/prometheus/#option-2-customized-scraping-configurations) around this. In short we need to add a podannotaton and a volume and volumemount setup on the pod (ie the collector) doing the scraping.

I can do this via the OpenTelemetryCollector setup with 'podannotations/volumes/volumenmounts'. However only the podannotations are passed onto the target allocator deployment. Not the volumes/volumemounts. So the target allocator fails to start as it tries to (via the annotation) mount a volume that doesn't exist.

Is there a way around this within the OpenTelemetryCollector config?

Steps to Reproduce

Setup OpenTelemetryCollector object with istio suggested setup

Expected Result

Target allocator gets the same resources as the collector pods

Actual Result

It doesn't so is unable to start. See log below.

Kubernetes Version

1.30

Operator version

0.102.0

Collector version

0.103.0

Environment information

Helm chart - https://open-telemetry.github.io/opentelemetry-helm-charts / opentelemetry-operator / 0.62.0 Target allocator - 0.107.0

Log output

Log failures from target allocator starting

status:
  conditions:
  - lastTransitionTime: "2024-09-05T03:06:34Z"
    message: 'Pod "collector-ds-with-ta-targetallocator-795686486b-glg9l" is invalid:
      spec.initContainers[1].volumeMounts[8].name: Not found: "istio-certs-dir"'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure

Additional context

OpenTelemetryCollector config

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: collector-ds-with-ta
  namespace: monitoring
spec:
  mode: daemonset
  podAnnotations:
    proxy.istio.io/config: |
     proxyMetadata:
       OUTPUT_CERTS: /etc/istio-output-certs
    sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs-dir", "mountPath": "/etc/istio-output-certs"}]'
    traffic.sidecar.istio.io/includeInboundPorts: ""
    traffic.sidecar.istio.io/includeOutboundIPRanges: ""
  podSecurityContext:
    runAsUser: 1000
    runAsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    readOnlyRootFilesystem: true
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: opentelemetry-collector
  tolerations:
    - key: "system"
      operator: "Exists"
      effect: "NoSchedule"
  volumeMounts:
    - mountPath: /etc/istio-output-certs
      name: istio-certs-dir
      readOnly: true
  volumes:
    - emptyDir:
        medium: Memory
      name: istio-certs-dir
  targetAllocator:
    enabled: true
    allocationStrategy: per-node
    podSecurityContext:
      runAsUser: 1000
      runAsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
    serviceAccount: opentelemetry-targetallocator
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault
    prometheusCR:
      enabled: true
      serviceMonitorSelector: {}
      podMonitorSelector: {}
  config:
    processors:
      batch: {}
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: 'otel-collector'
            scrape_interval: 30s
            static_configs:
            - targets: [ '0.0.0.0:8888' ]
    exporters:
      debug:
         verbosity: detailed
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch]
          exporters: [debug]
      telemetry:
        metrics:
          address: 0.0.0.0:8887

Generated target allocator deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2024-09-05T03:06:34Z"
  generation: 1
  labels:
    app.kubernetes.io/component: opentelemetry-targetallocator
    app.kubernetes.io/instance: monitoring.collector-ds-with-ta
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: collector-ds-with-ta-targetallocator
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
    argocd.argoproj.io/instance: multi-cluster-otel-ctap
  name: collector-ds-with-ta-targetallocator
  namespace: monitoring
  ownerReferences:
  - apiVersion: opentelemetry.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: OpenTelemetryCollector
    name: collector-ds-with-ta
    uid: 0811619b-5f2e-4723-bee9-52f957385acd
  resourceVersion: "77694469"
  uid: 32872245-31c9-4c44-9a60-82a95a51a012
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: opentelemetry-targetallocator
      app.kubernetes.io/instance: monitoring.collector-ds-with-ta
      app.kubernetes.io/managed-by: opentelemetry-operator
      app.kubernetes.io/name: collector-ds-with-ta-targetallocator
      app.kubernetes.io/part-of: opentelemetry
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        opentelemetry-targetallocator-config/hash: fb83e4dee19069627c959fb9bd4f3b2da0c3217ea2bdd463366c5bb71cfdd721
        proxy.istio.io/config: |
          proxyMetadata:
            OUTPUT_CERTS: /etc/istio-output-certs
        sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs-dir", "mountPath":
          "/etc/istio-output-certs"}]'
        traffic.sidecar.istio.io/includeInboundPorts: ""
        traffic.sidecar.istio.io/includeOutboundIPRanges: ""
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: opentelemetry-targetallocator
        app.kubernetes.io/instance: monitoring.collector-ds-with-ta
        app.kubernetes.io/managed-by: opentelemetry-operator
        app.kubernetes.io/name: collector-ds-with-ta-targetallocator
        app.kubernetes.io/part-of: opentelemetry
        app.kubernetes.io/version: latest
        argocd.argoproj.io/instance: multi-cluster-otel-ctap
    spec:
      containers:
      - args:
        - --enable-prometheus-cr-watcher
        env:
        - name: OTELCOL_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: blah/target-allocator:v0.107.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: 8080
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: ta-container
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: 8080
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          seccompProfile:
            type: RuntimeDefault
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: ta-internal
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsGroup: 1000
        runAsUser: 1000
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: opentelemetry-targetallocator
      serviceAccountName: opentelemetry-targetallocator
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: targetallocator.yaml
            path: targetallocator.yaml
          name: collector-ds-with-ta-targetallocator
        name: ta-internal
status:
  conditions:
  - lastTransitionTime: "2024-09-05T03:06:34Z"
    lastUpdateTime: "2024-09-05T03:06:34Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2024-09-05T03:06:34Z"
    lastUpdateTime: "2024-09-05T03:06:34Z"
    message: 'Pod "collector-ds-with-ta-targetallocator-795686486b-glg9l" is invalid:
      spec.initContainers[1].volumeMounts[8].name: Not found: "istio-certs-dir"'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure
  - lastTransitionTime: "2024-09-05T03:16:35Z"
    lastUpdateTime: "2024-09-05T03:16:35Z"
    message: ReplicaSet "collector-ds-with-ta-targetallocator-795686486b" has timed
      out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 1
  unavailableReplicas: 1
avbtrifork commented 4 days ago

One way that I got around it was to deploy the OpenTelemetryCollector without the istio podAnnotations and to let the target allocator deploy and then add the istio podAnnotations on the OpenTelemetryCollector. I could see that it updated the otel pod with the istio proxy but did not try to redeploy the target allocator. Hope it helps.