open-telemetry / opentelemetry-helm-charts

OpenTelemetry Helm Charts
https://opentelemetry.io
Apache License 2.0
399 stars 486 forks source link

hostmetrics does not associate any k8s attributes (missing k8s.node.name, etc.) #1306

Closed dfsdevops closed 3 months ago

dfsdevops commented 3 months ago

After adding the hostmetrics receiver, I do not see any attributes to associate new metrics to a particular node. Verifying this by visualizing the events in honeycomb. Is there a configuration that allows associating the node name to hostmetrics? My use-case here is to cover some stats that are not covered by kubeletstats (CPU load, etc.)

My values file is as follows:

mode: daemonset
fullnameOverride: otel-collector-agent

image:
  repository: "otel/opentelemetry-collector-k8s"

# Required to use the kubeletstats cpu/memory utilization metrics
clusterRole:
  create: true
  rules:
    - apiGroups:
        - ""
      resources:
        - nodes/proxy
      verbs:
        - get

extraEnvs:
  - name: HONEYCOMB_API_KEY
    valueFrom:
      secretKeyRef:
        name: honeycomb
        key: api-key

presets:
  hostMetrics:
    enabled: true
  # enables the k8sattributesprocessor and adds it to the traces, metrics, and logs pipelines
  kubernetesAttributes:
    enabled: true
    extractAllPodLabels: true
    extractAllPodAnnotations: true
  # enables the kubeletstatsreceiver and adds it to the metrics pipelines
  kubeletMetrics:
    enabled: true
  logsCollection:
    enabled: true

config:
  receivers:
    jaeger: null
    zipkin: null
    hostmetrics:
      collection_interval: 30s
      scrapers:
        load:
          cpu_average: true # divide by number of cores, better for generalized figures to alert on
    kubeletstats:
      collection_interval: 30s
      metric_groups:
        - pod
        - node
        - volume
      extra_metadata_labels:
        - k8s.volume.type
      metrics:
        k8s.node.uptime:
          enabled: true
        k8s.pod.uptime:
          enabled: true
        k8s.pod.cpu_limit_utilization:
          enabled: true
        k8s.pod.cpu_request_utilization:
          enabled: true
        k8s.pod.memory_limit_utilization:
          enabled: true
        k8s.pod.memory_request_utilization:
          enabled: true
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector # self metrics
            scrape_interval: 300s
            static_configs:
              - targets:
                  - ${env:MY_POD_IP}:8888
  exporters:
    debug:
      verbosity: detailed
      sampling_initial: 5
      sampling_thereafter: 200
    otlp:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
    otlp/k8s-metrics:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
        "x-honeycomb-dataset": "k8s-metrics"
    otlp/k8s-logs:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
        "x-honeycomb-dataset": "k8s-logs"
  processors:
    filter/ottl:
      error_mode: ignore
      metrics:
        datapoint:
          - 'resource.attributes["k8s.volume.type"] != nil and resource.attributes["k8s.volume.type"] != "persistentVolumeClaim"'
  service:
    pipelines:
      traces:
        receivers: [otlp]
        exporters: [otlp]
      metrics:
        exporters:
        - otlp/k8s-metrics
        processors:
        - memory_limiter
        - batch
        - filter/ottl
      logs:
        exporters: [otlp/k8s-logs]

ports:
  jaeger-compact:
    enabled: false
  jaeger-thrift:
    enabled: false
  jaeger-grpc:
    enabled: false
  zipkin:
    enabled: false
dosubot[bot] commented 3 months ago

Hey there, @dfsdevops! I'm here to assist you with any questions or concerns you have. Let's work together to resolve this issue.

To ensure that Kubernetes attributes such as k8s.node.name are associated with the metrics collected by the hostmetrics receiver, you need to make sure that the k8sattributes processor is correctly configured and included in the metrics pipeline.

Here are a few things to check and adjust in your configuration:

  1. Ensure the k8sattributes processor is configured correctly: Make sure the k8sattributes processor is enabled and configured to extract the necessary attributes.

  2. Include the k8sattributes processor in the metrics pipeline: Ensure that the k8sattributes processor is included in the metrics pipeline so that it can process the metrics collected by the hostmetrics receiver.

Here is an updated version of your configuration with these adjustments:

mode: daemonset
fullnameOverride: otel-collector-agent

image:
  repository: "otel/opentelemetry-collector-k8s"

# Required to use the kubeletstats cpu/memory utilization metrics
clusterRole:
  create: true
  rules:
    - apiGroups:
        - ""
      resources:
        - nodes/proxy
      verbs:
        - get

extraEnvs:
  - name: HONEYCOMB_API_KEY
    valueFrom:
      secretKeyRef:
        name: honeycomb
        key: api-key

presets:
  hostMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
    extractAllPodLabels: true
    extractAllPodAnnotations: true
  kubeletMetrics:
    enabled: true
  logsCollection:
    enabled: true

config:
  receivers:
    jaeger: null
    zipkin: null
    hostmetrics:
      collection_interval: 30s
      scrapers:
        load:
          cpu_average: true
    kubeletstats:
      collection_interval: 30s
      metric_groups:
        - pod
        - node
        - volume
      extra_metadata_labels:
        - k8s.volume.type
      metrics:
        k8s.node.uptime:
          enabled: true
        k8s.pod.uptime:
          enabled: true
        k8s.pod.cpu_limit_utilization:
          enabled: true
        k8s.pod.cpu_request_utilization:
          enabled: true
        k8s.pod.memory_limit_utilization:
          enabled: true
        k8s.pod.memory_request_utilization:
          enabled: true
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 300s
            static_configs:
              - targets:
                  - ${env:MY_POD_IP}:8888
  exporters:
    debug:
      verbosity: detailed
      sampling_initial: 5
      sampling_thereafter: 200
    otlp:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
    otlp/k8s-metrics:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
        "x-honeycomb-dataset": "k8s-metrics"
    otlp/k8s-logs:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
        "x-honeycomb-dataset": "k8s-logs"
  processors:
    filter/ottl:
      error_mode: ignore
      metrics:
        datapoint:
          - 'resource.attributes["k8s.volume.type"] != nil and resource.attributes["k8s.volume.type"] != "persistentVolumeClaim"'
    k8sattributes:
      passthrough: false
      extract:
        metadata:
          - k8s.node.name
  service:
    pipelines:
      traces:
        receivers: [otlp]
        exporters: [otlp]
      metrics:
        receivers: [hostmetrics, kubeletstats]
        processors: [memory_limiter, batch, filter/ottl, k8sattributes]
        exporters: [otlp/k8s-metrics]
      logs:
        receivers: [otlp]
        processors: [batch]
        exporters: [otlp/k8s-logs]

ports:
  jaeger-compact:
    enabled: false
  jaeger-thrift:
    enabled: false
  jaeger-grpc:
    enabled: false
  zipkin:
    enabled: false

This configuration ensures that the k8sattributes processor is included in the metrics pipeline and is set to extract the k8s.node.name attribute. This should help in associating the node name with the metrics collected by the hostmetrics receiver.

To continue talking to Dosu, mention @dosu.

dfsdevops commented 3 months ago

@dosu thats a good thought, but the chart already generates a configuration that includes the k8sattributes processor in the metrics pipeline

# Source: opentelemetry-collector/templates/configmap-agent.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-agent-agent
  namespace: otel
  labels:
    helm.sh/chart: opentelemetry-collector-0.102.0
    app.kubernetes.io/name: opentelemetry-collector
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: "0.107.0"
    app.kubernetes.io/managed-by: Helm

data:
  relay: |
    exporters:
      debug:
        sampling_initial: 5
        sampling_thereafter: 200
        verbosity: detailed
      otlp:
        endpoint: api.honeycomb.io:443
        headers:
          x-honeycomb-team: ${env:HONEYCOMB_API_KEY}
      otlp/k8s-logs:
        endpoint: api.honeycomb.io:443
        headers:
          x-honeycomb-dataset: k8s-logs
          x-honeycomb-team: ${env:HONEYCOMB_API_KEY}
      otlp/k8s-metrics:
        endpoint: api.honeycomb.io:443
        headers:
          x-honeycomb-dataset: k8s-metrics
          x-honeycomb-team: ${env:HONEYCOMB_API_KEY}
    extensions:
      health_check:
        endpoint: ${env:MY_POD_IP}:13133
    processors:
      batch: {}
      filter/ottl:
        error_mode: ignore
        metrics:
          datapoint:
          - resource.attributes["k8s.volume.type"] != nil and resource.attributes["k8s.volume.type"]
            != "persistentVolumeClaim"
      k8sattributes:
        extract:
          annotations:
          - from: pod
            key_regex: (.*)
            tag_name: $$1
          labels:
          - from: pod
            key_regex: (.*)
            tag_name: $$1
          metadata:
          - k8s.namespace.name
          - k8s.deployment.name
          - k8s.statefulset.name
          - k8s.daemonset.name
          - k8s.cronjob.name
          - k8s.job.name
          - k8s.node.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.pod.start_time
        filter:
          node_from_env_var: K8S_NODE_NAME
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: connection
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
    receivers:
      filelog:
        exclude:
        - /var/log/pods/otel_otel-collector-agent*_*/opentelemetry-collector/*.log
        include:
        - /var/log/pods/*/*/*.log
        include_file_name: false
        include_file_path: true
        operators:
        - id: container-parser
          max_log_size: 102400
          type: container
        retry_on_failure:
          enabled: true
        start_at: end
      hostmetrics:
        collection_interval: 30s
        root_path: /hostfs
        scrapers:
          cpu: null
          disk: null
          filesystem:
            exclude_fs_types:
              fs_types:
              - autofs
              - binfmt_misc
              - bpf
              - cgroup2
              - configfs
              - debugfs
              - devpts
              - devtmpfs
              - fusectl
              - hugetlbfs
              - iso9660
              - mqueue
              - nsfs
              - overlay
              - proc
              - procfs
              - pstore
              - rpc_pipefs
              - securityfs
              - selinuxfs
              - squashfs
              - sysfs
              - tracefs
              match_type: strict
            exclude_mount_points:
              match_type: regexp
              mount_points:
              - /dev/*
              - /proc/*
              - /sys/*
              - /run/k3s/containerd/*
              - /var/lib/docker/*
              - /var/lib/kubelet/*
              - /snap/*
          load:
            cpu_average: true
          memory: null
          network: null
      kubeletstats:
        auth_type: serviceAccount
        collection_interval: 30s
        endpoint: ${env:K8S_NODE_NAME}:10250
        extra_metadata_labels:
        - k8s.volume.type
        metric_groups:
        - pod
        - node
        - volume
        metrics:
          k8s.node.uptime:
            enabled: true
          k8s.pod.cpu_limit_utilization:
            enabled: true
          k8s.pod.cpu_request_utilization:
            enabled: true
          k8s.pod.memory_limit_utilization:
            enabled: true
          k8s.pod.memory_request_utilization:
            enabled: true
          k8s.pod.uptime:
            enabled: true
      otlp:
        protocols:
          grpc:
            endpoint: ${env:MY_POD_IP}:4317
          http:
            endpoint: ${env:MY_POD_IP}:4318
      prometheus:
        config:
          scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 300s
            static_configs:
            - targets:
              - ${env:MY_POD_IP}:8888
    service:
      extensions:
      - health_check
      pipelines:
        logs:
          exporters:
          - otlp/k8s-logs
          processors:
          - k8sattributes
          - memory_limiter
          - batch
          receivers:
          - otlp
          - filelog
        metrics:
          exporters:
          - otlp/k8s-metrics
          processors:
          - memory_limiter
          - batch
          - filter/ottl
          - k8sattributes
          receivers:
          - otlp
          - prometheus
          - hostmetrics
          - kubeletstats
        traces:
          exporters:
          - otlp
          processors:
          - k8sattributes
          - memory_limiter
          - batch
          receivers:
          - otlp
      telemetry:
        metrics:
          address: ${env:MY_POD_IP}:8888
dosubot[bot] commented 3 months ago

To ensure that Kubernetes attributes such as k8s.node.name are associated with the metrics collected by the hostmetrics receiver, you need to verify that the environment variables are correctly set in your deployment configuration. Specifically, the K8S_NODE_NAME environment variable should be set to provide the necessary context for the k8sattributes processor.

Here is an example of how to set the environment variables in your deployment configuration:

extraEnvs:
  - name: HONEYCOMB_API_KEY
    valueFrom:
      secretKeyRef:
        name: honeycomb
        key: api-key
  - name: K8S_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName
  - name: K8S_NODE_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP
  - name: K8S_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
  - name: K8S_POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: K8S_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: "k8s.cluster.name=demo"

Ensure that these environment variables are included in your deployment configuration to provide the necessary context for the k8sattributes processor. This should help in associating the node name and other Kubernetes attributes with the metrics collected by the hostmetrics receiver [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

TylerHelmuth commented 3 months ago

@dfsdevops the k8sattributes processor works by associating incoming telemetry, specifically the Resource or request context IP, with the kubernetes pod metadata it is remembering.

In the case of hostmetrics telemetry there isn't anything on in data to associate it to the kubernetes data. It has been a while since I ran the hostmetrics receiver on k8s (kubeletstats has always covered my needs), but I don't believe it is able to get per-pod metrics - all the metrics it collectors are for the entire node.

Is there a specific attribute you'd like to set for this data? Maybe the resourcedetection would better suit your needs if you're trying to add stuff like k8s.node.name.

dfsdevops commented 3 months ago

@TylerHelmuth thank you for clarifying, this could be a misunderstanding on my part. I thought it was able to associate a bare minimum of metadata in the way kubeletstats on the daemonset instances did the same. k8s.node.name is the most important for me as nearly everything else in my metrics has that, being able to cross-reference it would be useful, right now there is nothing in the event that distinguishes each node from one another in hostmetrics, I'm surprised there isn't something like at least a hostname attribute already baked into the hostmetrics preset. I will try messing around with resourcedetection or some other way to manually associate the node name to the data (maybe via the env var?).

Here's an example of what one of the metrics looks like as an event in honeycomb currently, only four columns with no way for me to tie it back to a specific node. Ideally everything in this metric would have a common attribute I could use cross-reference and merge everything I possibly can into a single honeycomb event because high-cardinality is encouraged by the platform. I know thats more of a honeycomb-specific thing but I thought it might help provide context. Screenshot 2024-08-15 at 9 29 38 AM

dfsdevops commented 3 months ago

I found a configuration I am happy with.

Explanation: Firstly, I removed the preset because I don't want everything it's sending over and it wasn't clear how to disable it from the preset, for now I'm only enabling load.

I created a separate pipeline to add the k8s.node.name attribute via an env var source in a resourcedetection processor. Then used the forward connector to send it into the main metrics pipeline so it's otherwise unified with those processors/exporters.

mode: daemonset
fullnameOverride: otel-collector-agent

image:
  repository: "otel/opentelemetry-collector-k8s"

# Required to use the kubeletstats cpu/memory utilization metrics
clusterRole:
  create: true
  rules:
    - apiGroups:
        - ""
      resources:
        - nodes/proxy
      verbs:
        - get

extraEnvs:
  - name: HONEYCOMB_API_KEY
    valueFrom:
      secretKeyRef:
        name: honeycomb
        key: api-key
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: "k8s.node.name=$(K8S_NODE_NAME)"

extraVolumes:
  - name: hostfs
    hostPath:
      path: /

extraVolumeMounts:
  - name: hostfs
    mountPath: /hostfs
    readOnly: true
    mountPropagation: HostToContainer

presets:
  # enables the k8sattributesprocessor and adds it to the traces, metrics, and logs pipelines
  kubernetesAttributes:
    enabled: true
    extractAllPodLabels: true
    extractAllPodAnnotations: true
  # enables the kubeletstatsreceiver and adds it to the metrics pipelines
  kubeletMetrics:
    enabled: true
  logsCollection:
    enabled: true

config:
  connectors:
    forward:
  receivers:
    jaeger: null
    zipkin: null
    hostmetrics:
      collection_interval: 30s
      root_path: /hostfs
      scrapers:
        load:
          cpu_average: true # divide by number of cores, better for generalized figures to alert on
    kubeletstats:
      collection_interval: 30s
      metric_groups:
        - pod
        - node
        - volume
      extra_metadata_labels:
        - k8s.volume.type
      metrics:
        k8s.node.uptime:
          enabled: true
        k8s.pod.uptime:
          enabled: true
        k8s.pod.cpu_limit_utilization:
          enabled: true
        k8s.pod.cpu_request_utilization:
          enabled: true
        k8s.pod.memory_limit_utilization:
          enabled: true
        k8s.pod.memory_request_utilization:
          enabled: true
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector # self metrics
            scrape_interval: 300s
            static_configs:
              - targets:
                  - ${env:MY_POD_IP}:8888
  exporters:
    debug:
      verbosity: detailed
      sampling_initial: 5
      sampling_thereafter: 200
    otlp:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
    otlp/k8s-metrics:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
        "x-honeycomb-dataset": "k8s-metrics"
    otlp/k8s-logs:
      endpoint: "api.honeycomb.io:443"
      headers:
        "x-honeycomb-team": "${env:HONEYCOMB_API_KEY}"
        "x-honeycomb-dataset": "k8s-logs"
  processors:
    resourcedetection:
      detectors:
        - env
    filter/ottl:
      error_mode: ignore
      metrics:
        datapoint:
          - 'resource.attributes["k8s.volume.type"] != nil and resource.attributes["k8s.volume.type"] != "persistentVolumeClaim"'
  service:
    # telemetry:
    #   logs:
    #     level: debug
    pipelines:
      traces:
        receivers: [otlp]
        exporters: [otlp]
      metrics/hostmetrics:
        receivers: [hostmetrics]
        processors:
          - resourcedetection
        exporters: [forward]
      metrics:
        receivers:
          - otlp
          - prometheus
          - forward
          - kubeletstats
        exporters:
          - otlp/k8s-metrics
          # uncomment the following line to enable debug logging for metrics
          # - debug
        processors:
          - memory_limiter
          - batch
          - filter/ottl
          - k8sattributes
      logs:
        exporters: [otlp/k8s-logs]

ports:
  jaeger-compact:
    enabled: false
  jaeger-thrift:
    enabled: false
  jaeger-grpc:
    enabled: false
  zipkin:
    enabled: false