Kubernetes Attributes Processor adds wrong k8s.container.name value

Component(s)

processor/k8sattributes

What happened?

Description

The Kubernetes Attributes Processor (k8sattributes) adds the wrong container name to pods with init container. I read metrics using the Prometheus receiver.

Steps to Reproduce

Setup processor to associate with pod ip, uid, and lastly connection details and pull k8s.container.name

processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.container.name
    pod_association:
      - sources:
        - from: resource_attribute
          name: k8s.pod.ip
      - sources:
        - from: resource_attribute
          name: k8s.pod.uid
      - sources:
        - from: connection

Expose metrics from container foo with a pod spec like this

apiVersion: v1
kind: Pod
metadata:
  name: foo
spec:
  containers:
  - name: foo
    [...]
  initContainers:
  - name: linkerd-init
    [...]

Expected Result

{
    "resource": {
      "attributes": {
        "kube.container.name": "foo"
      }
  }
}

Actual Result

{
    "resource": {
      "attributes": {
        "kube.container.name": "linkerd-init"
      }
  }
}

Collector version

v0.107.0

Pinging code owners:

processor/k8sattributes: @dmitryax @rmfitzpatrick @fatsheep9146 @TylerHelmuth

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Hi @martinohansen! I am currently trying to reproduce this - Would you mind posting also the configuration of the prometheus receiver and other components within the metrics pipeline? What confuses me a bit here is that the container attribute in the results is called kube.container.name, whereas in the k8sattribute processor it should be called k8s.container.name - was that a type or is it in fact called like that in the result? In this case it might be that this attribute is set somewhere else (maybe from the labels detected by the prometheus receiver) and the container name is not added at all by the k8sattributes processor, as this one requires either the k8s.container.name or container.id to be present at the time the resource is processed by the k8sattributes processor (see https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/ee5d327e5fb34c88339dc99e7c30d5d96215e0de/processor/k8sattributesprocessor/README.md?plain=1#L97)

Hi @martinohansen! I am currently trying to reproduce this - Would you mind posting also the configuration of the prometheus receiver and other components within the metrics pipeline?

Hi @bacherfl! Thanks for looking into this, I appreciate it. I will paste the full config at the end of my response.

What confuses me a bit here is that the container attribute in the results is called kube.container.name, whereas in the k8sattribute processor it should be called k8s.container.name - was that a type or is it in fact called like that in the result?

Ups, I'm sorry about that, it's a typo and yet it isn't. For consistency on the backend, we're renaming k8s to kube and I forgot to normalize that fact in the results. Sorry for the confusion.

transform/rename-to-kube:
  error_mode: ignore
  metric_statements:
    - context: resource
      statements:
        - replace_all_patterns(attributes, "key", "k8s\\.(.*)", "kube.$$1")

Here is the entire config:

# Collector
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ${env:POD_IP}:4317
        max_recv_msg_size_mib: 64
      http:
        endpoint: ${env:POD_IP}:4318
processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.container.name
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.deployment.name
        - k8s.replicaset.name
        - k8s.node.name
        - k8s.daemonset.name
        - k8s.cronjob.name
        - k8s.job.name
        - k8s.statefulset.name
      labels:
        - tag_name: k8s.pod.label.app
          key: app
          from: pod
        - tag_name: k8s.pod.label.component
          key: component
          from: pod
        - tag_name: k8s.pod.label.zone
          key: zone
          from: pod
    pod_association:
      - sources:
        - from: resource_attribute
          name: k8s.pod.ip
      - sources:
        - from: resource_attribute
          name: k8s.pod.uid
      - sources:
        - from: connection
  transform/add-workload-label:
    metric_statements:
      - context: datapoint
        statements:
        - set(attributes["kube_workload_name"], resource.attributes["k8s.deployment.name"])
        - set(attributes["kube_workload_name"], resource.attributes["k8s.statefulset.name"])
        - set(attributes["kube_workload_type"], "deployment") where resource.attributes["k8s.deployment.name"] != nil
        - set(attributes["kube_workload_type"], "statefulset") where resource.attributes["k8s.statefulset.name"] != nil
  transform/rename-to-kube:
    error_mode: ignore
    metric_statements:
      - context: resource
        statements:
          - replace_all_patterns(attributes, "key", "k8s\\.(.*)", "kube.$$1")
exporters:
  otlphttp/pipeline-metrics:
    endpoint: ${env:OTLP_PIPELINE_METRICS_ENDPOINT}
    headers:
      Authorization: ${env:OTLP_PIPELINE_METRICS_TOKEN}
service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors:
      - k8sattributes
      - transform/add-workload-label
      - transform/rename-to-kube
      exporters: [otlphttp/pipeline-metrics]

# Agent
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ${env:POD_IP}:4317
      http:
        endpoint: ${env:POD_IP}:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: k8s
          tls_config:
            insecure_skip_verify: true
          scrape_interval: 15s
          kubernetes_sd_configs:
            - role: pod
              selectors:
                - role: pod
                  field: spec.nodeName=${env:NODE_NAME}
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              regex: "true"
              action: keep
            - action: replace
              source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
              target_label: __scheme__
              regex: (https?)
            - action: replace
              source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
              target_label: __metrics_path__
              regex: (.+)
            - action: replace
              source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              target_label: __address__
            # Allow overriding the scrape timeout and interval from pod
            # annotation.
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_timeout]
              regex: '(.+)'
              target_label: __scrape_timeout__
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_interval]
              regex: '(.+)'
              target_label: __scrape_interval__
exporters:
  otlp:
    endpoint: "otel-collector.otel.svc.cluster.local:4317"
    tls:
      insecure: true
    retry_on_failure:
      enabled: true
processors:
  batch:
  k8sattributes:
    passthrough: true
service:
  pipelines:
    metrics:
      receivers: [otlp, prometheus]
      processors: [batch, k8sattributes]
      exporters: [otlp]

P.s. I did remove some batching and memory limit config for simplicity since they are unrelated

Thank you for the config @martinohansen ! I will try to reproduce the issue and will get back to you when I have gained more insights into what could be causing this

I did some tests now, and I discovered that using the kubernetes_sd_configs in the prometheus receiver will create a prometheus scrape target for each port within each container within the pod, including the init containers, where the following relabel config takes care of only keeping the target that is based on what is set for the prometheus port set in the annotations of the pod:

            - action: replace
              source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              target_label: __address__

In the example I was testing this I have a jaeger container with several ports exposed, so with that relabel config only the <pod-ip>:14269 ended up as a target, while the other ports were omitted:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    linkerd.io/inject: disabled
    prometheus.io/port: "14269"
    prometheus.io/scrape: "true"
  labels:
    app: jaeger2
    app.kubernetes.io/component: all-in-one
    app.kubernetes.io/instance: jaeger2
    app.kubernetes.io/name: jaeger2
    app.kubernetes.io/part-of: jaeger2
  name: jaeger2
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: jaeger2
      app.kubernetes.io/component: all-in-one
      app.kubernetes.io/instance: jaeger2
      app.kubernetes.io/name: jaeger2
      app.kubernetes.io/part-of: jaeger2
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        linkerd.io/inject: disabled
        prometheus.io/port: "14269"
        prometheus.io/scrape: "true"
        sidecar.istio.io/inject: "false"
      creationTimestamp: null
      labels:
        app: jaeger2
        app.kubernetes.io/component: all-in-one
        app.kubernetes.io/instance: jaeger2
        app.kubernetes.io/name: jaeger2
        app.kubernetes.io/part-of: jaeger2
    spec:
      initContainers:
        - name: init-myservice
          image: busybox
          command: [ 'sh', '-c', "echo 'init'" ]
      containers:
        - args:
            - --sampling.strategies-file=/etc/jaeger/sampling/sampling.json
          env:
            - name: SPAN_STORAGE_TYPE
              value: memory
            - name: METRICS_STORAGE_TYPE
            - name: COLLECTOR_ZIPKIN_HOST_PORT
              value: :9411
            - name: JAEGER_DISABLED
              value: "false"
            - name: COLLECTOR_OTLP_ENABLED
              value: "true"
          image: jaegertracing/all-in-one:1.53.0
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /
              port: 14269
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 15
            successThreshold: 1
            timeoutSeconds: 1
          name: jaeger
          ports:
            - containerPort: 5775
              name: zk-compact-trft
              protocol: UDP
            - containerPort: 5778
              name: config-rest
              protocol: TCP
            - containerPort: 6831
              name: jg-compact-trft
              protocol: UDP
            - containerPort: 6832
              name: jg-binary-trft
              protocol: UDP
            - containerPort: 9411
              name: zipkin
              protocol: TCP
            - containerPort: 14267
              name: c-tchan-trft
              protocol: TCP
            - containerPort: 14268
              name: c-binary-trft
              protocol: TCP
            - containerPort: 16685
              name: grpc-query
              protocol: TCP
            - containerPort: 16686
              name: query
              protocol: TCP
            - containerPort: 14269
              name: admin-http
              protocol: TCP
            - containerPort: 14250
              name: grpc
              protocol: TCP
            - containerPort: 4317
              name: grpc-otlp
              protocol: TCP
            - containerPort: 4318
              name: http-otlp
              protocol: TCP

However, for init containers which mostly do not have a port defined, this rule does not catch this, and a separate target with the same endpoint will be created, so the same endpoint will effectively be called twice during each scrape, yielding the same set of metrics, but for different attribute sets - one including the name of the init container, and the other the correct container - the OTel resource has the same name in both cases, so that might explain why the container name ends up being set incorrectly at the end. What I found as a potential workaround is an additional relabel config to exclude targets created for init containers - the prometheus library internally sets this attribute:

            - source_labels: [ __meta_kubernetes_pod_container_init ]
              regex: "false"
              action: keep

would that be an option for you @martinohansen?

open-telemetry / opentelemetry-collector-contrib