open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.23k stars 444 forks source link

volume and volumemount resources config not passed to target allocator but podannotations are #3264

Open paebersold-tyro opened 2 months ago

paebersold-tyro commented 2 months ago

Component(s)

target allocator

What happened?

Description

Hello, this is half bug/half feature request.

We are using the otel-operator as a daemonset with the target allocator enabled (via the operator helm chart). We are also using istio (sidecar being auto injected). For the setting up of prometheus scraping endpoints inside the istio mesh we are following the istio guidelines (https://istio.io/latest/docs/ops/integrations/prometheus/#option-2-customized-scraping-configurations) around this. In short we need to add a podannotaton and a volume and volumemount setup on the pod (ie the collector) doing the scraping.

I can do this via the OpenTelemetryCollector setup with 'podannotations/volumes/volumenmounts'. However only the podannotations are passed onto the target allocator deployment. Not the volumes/volumemounts. So the target allocator fails to start as it tries to (via the annotation) mount a volume that doesn't exist.

Is there a way around this within the OpenTelemetryCollector config?

Steps to Reproduce

Setup OpenTelemetryCollector object with istio suggested setup

Expected Result

Target allocator gets the same resources as the collector pods

Actual Result

It doesn't so is unable to start. See log below.

Kubernetes Version

1.30

Operator version

0.102.0

Collector version

0.103.0

Environment information

Helm chart - https://open-telemetry.github.io/opentelemetry-helm-charts / opentelemetry-operator / 0.62.0 Target allocator - 0.107.0

Log output

Log failures from target allocator starting

status:
  conditions:
  - lastTransitionTime: "2024-09-05T03:06:34Z"
    message: 'Pod "collector-ds-with-ta-targetallocator-795686486b-glg9l" is invalid:
      spec.initContainers[1].volumeMounts[8].name: Not found: "istio-certs-dir"'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure

Additional context

OpenTelemetryCollector config

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: collector-ds-with-ta
  namespace: monitoring
spec:
  mode: daemonset
  podAnnotations:
    proxy.istio.io/config: |
     proxyMetadata:
       OUTPUT_CERTS: /etc/istio-output-certs
    sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs-dir", "mountPath": "/etc/istio-output-certs"}]'
    traffic.sidecar.istio.io/includeInboundPorts: ""
    traffic.sidecar.istio.io/includeOutboundIPRanges: ""
  podSecurityContext:
    runAsUser: 1000
    runAsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    readOnlyRootFilesystem: true
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: opentelemetry-collector
  tolerations:
    - key: "system"
      operator: "Exists"
      effect: "NoSchedule"
  volumeMounts:
    - mountPath: /etc/istio-output-certs
      name: istio-certs-dir
      readOnly: true
  volumes:
    - emptyDir:
        medium: Memory
      name: istio-certs-dir
  targetAllocator:
    enabled: true
    allocationStrategy: per-node
    podSecurityContext:
      runAsUser: 1000
      runAsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
    serviceAccount: opentelemetry-targetallocator
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault
    prometheusCR:
      enabled: true
      serviceMonitorSelector: {}
      podMonitorSelector: {}
  config:
    processors:
      batch: {}
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: 'otel-collector'
            scrape_interval: 30s
            static_configs:
            - targets: [ '0.0.0.0:8888' ]
    exporters:
      debug:
         verbosity: detailed
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch]
          exporters: [debug]
      telemetry:
        metrics:
          address: 0.0.0.0:8887

Generated target allocator deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2024-09-05T03:06:34Z"
  generation: 1
  labels:
    app.kubernetes.io/component: opentelemetry-targetallocator
    app.kubernetes.io/instance: monitoring.collector-ds-with-ta
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: collector-ds-with-ta-targetallocator
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
    argocd.argoproj.io/instance: multi-cluster-otel-ctap
  name: collector-ds-with-ta-targetallocator
  namespace: monitoring
  ownerReferences:
  - apiVersion: opentelemetry.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: OpenTelemetryCollector
    name: collector-ds-with-ta
    uid: 0811619b-5f2e-4723-bee9-52f957385acd
  resourceVersion: "77694469"
  uid: 32872245-31c9-4c44-9a60-82a95a51a012
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: opentelemetry-targetallocator
      app.kubernetes.io/instance: monitoring.collector-ds-with-ta
      app.kubernetes.io/managed-by: opentelemetry-operator
      app.kubernetes.io/name: collector-ds-with-ta-targetallocator
      app.kubernetes.io/part-of: opentelemetry
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        opentelemetry-targetallocator-config/hash: fb83e4dee19069627c959fb9bd4f3b2da0c3217ea2bdd463366c5bb71cfdd721
        proxy.istio.io/config: |
          proxyMetadata:
            OUTPUT_CERTS: /etc/istio-output-certs
        sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs-dir", "mountPath":
          "/etc/istio-output-certs"}]'
        traffic.sidecar.istio.io/includeInboundPorts: ""
        traffic.sidecar.istio.io/includeOutboundIPRanges: ""
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: opentelemetry-targetallocator
        app.kubernetes.io/instance: monitoring.collector-ds-with-ta
        app.kubernetes.io/managed-by: opentelemetry-operator
        app.kubernetes.io/name: collector-ds-with-ta-targetallocator
        app.kubernetes.io/part-of: opentelemetry
        app.kubernetes.io/version: latest
        argocd.argoproj.io/instance: multi-cluster-otel-ctap
    spec:
      containers:
      - args:
        - --enable-prometheus-cr-watcher
        env:
        - name: OTELCOL_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: blah/target-allocator:v0.107.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: 8080
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: ta-container
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: 8080
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          seccompProfile:
            type: RuntimeDefault
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /conf
          name: ta-internal
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsGroup: 1000
        runAsUser: 1000
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: opentelemetry-targetallocator
      serviceAccountName: opentelemetry-targetallocator
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: targetallocator.yaml
            path: targetallocator.yaml
          name: collector-ds-with-ta-targetallocator
        name: ta-internal
status:
  conditions:
  - lastTransitionTime: "2024-09-05T03:06:34Z"
    lastUpdateTime: "2024-09-05T03:06:34Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2024-09-05T03:06:34Z"
    lastUpdateTime: "2024-09-05T03:06:34Z"
    message: 'Pod "collector-ds-with-ta-targetallocator-795686486b-glg9l" is invalid:
      spec.initContainers[1].volumeMounts[8].name: Not found: "istio-certs-dir"'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure
  - lastTransitionTime: "2024-09-05T03:16:35Z"
    lastUpdateTime: "2024-09-05T03:16:35Z"
    message: ReplicaSet "collector-ds-with-ta-targetallocator-795686486b" has timed
      out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 1
  unavailableReplicas: 1
avbtrifork commented 1 month ago
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otelcol
spec:
  mode: statefulset
  serviceAccount: otelcol-metrics-collector
  volumes:
    - emptyDir:
        medium: Memory
      name: istio-certs
  volumeMounts:
    - mountPath: /etc/prom-certs/
      name: istio-certs
  podAnnotations:
    traffic.sidecar.istio.io/includeInboundPorts: ""   # do not intercept any inbound ports
    traffic.sidecar.istio.io/includeOutboundIPRanges: ""  # do not intercept any outbound traffic
    proxy.istio.io/config: |  # configure an env variable `OUTPUT_CERTS` to write certificates to the given folder
      proxyMetadata:
        OUTPUT_CERTS: /etc/istio-output-certs
    sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs", "mountPath": "/etc/istio-output-certs"}]' # mount the shared volume at sidecar proxy

  targetAllocator:
    enabled: false
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-targetallocator
    app.kubernetes.io/instance: otel-metrics-scraping.otelcol
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: otelcol-targetallocator
    app.kubernetes.io/part-of: opentelemetry
  name: otelcol-targetallocator
  namespace: otel-metrics-scraping
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: opentelemetry-targetallocator
      app.kubernetes.io/instance: otel-metrics-scraping.otelcol
      app.kubernetes.io/managed-by: opentelemetry-operator
      app.kubernetes.io/name: otelcol-targetallocator
      app.kubernetes.io/part-of: opentelemetry
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        proxy.istio.io/config: |
          proxyMetadata:
            OUTPUT_CERTS: /etc/istio-output-certs
        sidecar.istio.io/userVolumeMount: '[{"name": "istio-certs", "mountPath": "/etc/istio-output-certs"}]'
        traffic.sidecar.istio.io/includeInboundPorts: ""
        traffic.sidecar.istio.io/includeOutboundIPRanges: ""
      labels:
        sidecar.istio.io/inject: "true"
        app.kubernetes.io/component: opentelemetry-targetallocator
        app.kubernetes.io/instance: otel-metrics-scraping.otelcol
        app.kubernetes.io/managed-by: opentelemetry-operator
        app.kubernetes.io/name: otelcol-targetallocator
        app.kubernetes.io/part-of: opentelemetry
        app.kubernetes.io/version: latest
    spec:
      containers:
        - env:
            - name: OTELCOL_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
          image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.103.0
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /livez
              port: 8080
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          name: ta-container
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /readyz
              port: 8080
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /conf
              name: ta-internal
            - mountPath: /etc/prom-certs/
              name: istio-certs
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccountName: otelcol-metrics-target-allocator
      shareProcessNamespace: false
      terminationGracePeriodSeconds: 30
      volumes:
        - configMap:
            defaultMode: 420
            items:
              - key: targetallocator.yaml
                path: targetallocator.yaml
            name: otelcol-targetallocator
          name: ta-internal
        - emptyDir:
            medium: Memory
          name: istio-certs
apiVersion: v1
data:
  targetallocator.yaml: |
    allocation_strategy: consistent-hashing
    collector_selector:
      matchlabels:
        app.kubernetes.io/component: opentelemetry-collector
        app.kubernetes.io/instance: otel-metrics-scraping.otelcol
        app.kubernetes.io/managed-by: opentelemetry-operator
        app.kubernetes.io/part-of: opentelemetry
      matchexpressions: []
    config:
      scrape_configs:
        - job_name: otel-collector
          scrape_interval: 30s
          static_configs:
            - targets:
                - 0.0.0.0:8888
    filter_strategy: relabel-config
    prometheus_cr:
      enabled: true
      pod_monitor_selector:
        matchlabels: {}
        matchexpressions: []
      scrape_interval: 30s
      service_monitor_selector:
        matchlabels: {}
        matchexpressions: []
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/component: opentelemetry-targetallocator
    app.kubernetes.io/instance: otel-metrics-scraping.otelcol
    app.kubernetes.io/managed-by: opentelemetry-operator
    app.kubernetes.io/name: otelcol-targetallocator
    app.kubernetes.io/part-of: opentelemetry
    app.kubernetes.io/version: latest
  name: otelcol-targetallocator
  namespace: otel-metrics-scraping

The way to get around this bug currently is to disable the target allocator in the OpenTelemetryCollector CRD and then deploy the target allocator with the deployment and config map that you see posted above. I have tested and it works just fine. Hope it help you out.