open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.13k stars 2.4k forks source link

Collector not working when k8sattributes in use #35879

Open clintonb opened 1 month ago

clintonb commented 1 month ago

Component(s)

processor/k8sattributes

What happened?

Description

I am trying to add k8sattributes to a gateway collector, but the collector and health check are not functioning. The collector appears to start, but refuses connections on the receiving ports. The health check endpoint returns a 503 with {"status":"Server not available","upSince":"0001-01-01T00:00:00Z","uptime":""}.

Steps to Reproduce

  1. Build an image with the configuration below.
  2. Run it.

Expected Result

  1. Collector responds to health checks at curl -v http://localhost:13133.
  2. Received traces include k8s attributes.

Actual Result

The collector never becomes healthy, and does not accept any signals.

Collector version

0.111.0

Environment information

Environment

OS: macOS 15.0.1 (Docker), and GKE Autopilot

OpenTelemetry Collector configuration

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"
        include_metadata: true

processors:
  k8sattributes:
    auth_type: "serviceAccount"
    extract:
      metadata:
#        - k8s.namespace.name
#        - k8s.deployment.name
#        - k8s.statefulset.name
#        - k8s.daemonset.name
#        - k8s.cronjob.name
#        - k8s.job.name
#        - k8s.node.name
        - k8s.pod.name
#        - k8s.pod.uid
#        - k8s.pod.start_time
    passthrough: false
#    pod_association:
#      - sources:
#          - from: resource_attribute
#            name: k8s.pod.ip
#      - sources:
#          - from: resource_attribute
#            name: k8s.pod.uid
#      - sources:
#          - from: connection

exporters:
  # NOTE: Add this to the list of pipeline exporters to see the collector's debug logs
  debug:
    verbosity: detailed
service:
  extensions: [ health_check ]
  # https://opentelemetry.io/docs/collector/configuration/#telemetry
  telemetry:
    # This controls log verbosity of the collector itself.
    logs:
      encoding: json
      level: "debug"
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ k8sattributes ]
      exporters: [ debug ]

Log output

otel-collector-1  | {"level":"info","ts":1729289848.7740626,"caller":"service@v0.111.0/service.go:136","msg":"Setting up own telemetry..."}
otel-collector-1  | {"level":"info","ts":1729289848.7742362,"caller":"telemetry/metrics.go:70","msg":"Serving metrics","address":"localhost:8888","metrics level":"Normal"}
otel-collector-1  | {"level":"info","ts":1729289848.7743702,"caller":"builders/builders.go:26","msg":"Development component. May change in the future.","kind":"exporter","data_type":"traces","name":"debug"}
otel-collector-1  | {"level":"debug","ts":1729289848.774658,"caller":"builders/builders.go:24","msg":"Beta component. May change in the future.","kind":"processor","name":"k8sattributes","pipeline":"traces"}
otel-collector-1  | {"level":"debug","ts":1729289848.774745,"caller":"builders/builders.go:24","msg":"Stable component.","kind":"receiver","name":"otlp","data_type":"traces"}
otel-collector-1  | {"level":"debug","ts":1729289848.7748547,"caller":"builders/extension.go:48","msg":"Beta component. May change in the future.","kind":"extension","name":"health_check"}
otel-collector-1  | {"level":"info","ts":1729289848.7753859,"caller":"service@v0.111.0/service.go:208","msg":"Starting otelcol-contrib...","Version":"0.111.0","NumCPU":16}
otel-collector-1  | {"level":"info","ts":1729289848.775412,"caller":"extensions/extensions.go:39","msg":"Starting extensions..."}
otel-collector-1  | {"level":"info","ts":1729289848.775442,"caller":"extensions/extensions.go:42","msg":"Extension is starting...","kind":"extension","name":"health_check"}
otel-collector-1  | {"level":"info","ts":1729289848.7754776,"caller":"healthcheckextension@v0.111.0/healthcheckextension.go:33","msg":"Starting health_check extension","kind":"extension","name":"health_check","config":{"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
otel-collector-1  | {"level":"warn","ts":1729289848.7757652,"caller":"internal@v0.111.0/warning.go:40","msg":"Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.","kind":"extension","name":"health_check","documentation":"https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
otel-collector-1  | {"level":"info","ts":1729289848.7758508,"caller":"extensions/extensions.go:59","msg":"Extension started.","kind":"extension","name":"health_check"}


### Additional context

I run `curl -v http://localhost:13133` to check if the collector is healthy.
github-actions[bot] commented 1 month ago

Pinging code owners:

vkamlesh commented 1 month ago

Can you provide more details on how you configured the OTEL collector? Additionally, why did you commented the pod_association block in the configuration? I believe that without pod_association, the k8sattributes processor will not function correctly.

clintonb commented 1 month ago

@vkamlesh I've posted the smallest configuration I have to replicate the issue. The logs are everything I have. k8sattributes doesn't seem to log anything, even at the debug level.

The collector is broken even if I restore all commented-out code. This is when I run both locally and on Kubernetes.

I don't think anything in my Dockerfile should affect this processor, but here it is for completeness:

# Adapted from:
#  - https://www.honeycomb.io/blog/rescue-struggling-pods-from-scratch
#  - https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/cmd/otelcontribcol/Dockerfile
FROM otel/opentelemetry-collector-contrib:0.111.0 AS binary
FROM alpine:latest

ARG USER_UID=10001
USER ${USER_UID}

COPY --from=binary /otelcol-contrib /

EXPOSE 4317 4318 55680 55679

COPY config.yaml /etc/otelcol/config.yaml

ENV LOG_LEVEL=info

ARG COMMIT_SHA=""
ENV COMMIT_SHA=${COMMIT_SHA}

# Remove the entrypoint so we can execute other commands for hooks and other purposes.
ENTRYPOINT []
CMD ["/otelcol-contrib", "--config", "/etc/otelcol/config.yaml"]
vkamlesh commented 1 month ago

For k8sattributes, I think you need to un-comment pod_association section. For example

      k8sattributes/logs: #Extracting Kubernetes attributes from resource metadata.
        extract:
          metadata:
            - k8s.namespace.name
            - k8s.deployment.name
            - k8s.statefulset.name
            - k8s.daemonset.name
            - k8s.cronjob.name
            - k8s.job.name
            - k8s.node.name
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.pod.start_time
            - k8s.cluster.uid
            - k8s.container.name
            - container.image.name
            - container.image.tag
            - k8s.cluster.uid
        filter:
          node_from_env_var: K8S_NODE_NAME
        passthrough: false
        pod_association:
          - sources:
              - from: resource_attribute
                name: k8s.pod.ip
          - sources:
              - from: resource_attribute
                name: k8s.pod.uid
          - sources:
              - from: resource_attribute
                name: container.id
          - sources:
              - from: connection
clintonb commented 1 month ago

@vkamlesh I tried that and it doesn't work.

config.yaml

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"
        include_metadata: true

processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.statefulset.name
        - k8s.daemonset.name
        - k8s.cronjob.name
        - k8s.job.name
        - k8s.node.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.pod.start_time
        - k8s.cluster.uid
        - k8s.container.name
        - container.image.name
        - container.image.tag
        - k8s.cluster.uid
    filter:
      node_from_env_var: K8S_NODE_NAME
    passthrough: false
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.ip
      - sources:
          - from: resource_attribute
            name: k8s.pod.uid
      - sources:
          - from: resource_attribute
            name: container.id
      - sources:
          - from: connection

exporters:
  # NOTE: Add this to the list of pipeline exporters to see the collector's debug logs
  debug:
    verbosity: detailed
service:
  extensions: [ health_check ]
  # https://opentelemetry.io/docs/collector/configuration/#telemetry
  telemetry:
    # This controls log verbosity of the collector itself.
    logs:
      encoding: json
      level: "debug"
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ k8sattributes ]
      exporters: [ debug ]

collector logs

otel-collector-1  | {"level":"info","ts":1729696350.0262184,"caller":"service@v0.111.0/service.go:136","msg":"Setting up own telemetry..."}
otel-collector-1  | {"level":"info","ts":1729696350.0269117,"caller":"telemetry/metrics.go:70","msg":"Serving metrics","address":"localhost:8888","metrics level":"Normal"}
otel-collector-1  | {"level":"info","ts":1729696350.0271108,"caller":"builders/builders.go:26","msg":"Development component. May change in the future.","kind":"exporter","data_type":"traces","name":"debug"}
otel-collector-1  | {"level":"debug","ts":1729696350.0287833,"caller":"builders/builders.go:24","msg":"Beta component. May change in the future.","kind":"processor","name":"k8sattributes","pipeline":"traces"}
otel-collector-1  | {"level":"debug","ts":1729696350.0288186,"caller":"builders/builders.go:24","msg":"Stable component.","kind":"receiver","name":"otlp","data_type":"traces"}
otel-collector-1  | {"level":"debug","ts":1729696350.028841,"caller":"builders/extension.go:48","msg":"Beta component. May change in the future.","kind":"extension","name":"health_check"}
otel-collector-1  | {"level":"info","ts":1729696350.029432,"caller":"service@v0.111.0/service.go:208","msg":"Starting otelcol-contrib...","Version":"0.111.0","NumCPU":16}
otel-collector-1  | {"level":"info","ts":1729696350.0294414,"caller":"extensions/extensions.go:39","msg":"Starting extensions..."}
otel-collector-1  | {"level":"info","ts":1729696350.0296128,"caller":"extensions/extensions.go:42","msg":"Extension is starting...","kind":"extension","name":"health_check"}
otel-collector-1  | {"level":"info","ts":1729696350.029626,"caller":"healthcheckextension@v0.111.0/healthcheckextension.go:33","msg":"Starting health_check extension","kind":"extension","name":"health_check","config":{"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
otel-collector-1  | {"level":"warn","ts":1729696350.031119,"caller":"internal@v0.111.0/warning.go:40","msg":"Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.","kind":"extension","name":"health_check","documentation":"https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
otel-collector-1  | {"level":"info","ts":1729696350.0315917,"caller":"extensions/extensions.go:59","msg":"Extension started.","kind":"extension","name":"health_check"}

Health check response

curl -v http://localhost:13133
* Host localhost:13133 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:13133...
* Connected to localhost (::1) port 13133
> GET / HTTP/1.1
> Host: localhost:13133
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 503 Service Unavailable
< Content-Type: application/json
< Date: Wed, 23 Oct 2024 15:14:10 GMT
< Content-Length: 78
<
* Connection #0 to host localhost left intact
{"status":"Server not available","upSince":"0001-01-01T00:00:00Z","uptime":""}%
ChrsMark commented 1 month ago

I don't think that's an issue with the processor/k8sattributes component. It looks like an issue with how the otlp receiver and/or the health_check extension. I would speculate the issue lies on how these are configured and specifically the endpoint part.

If you are running the Collector on K8s I would advice to take a look into https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector and either use the Helm Chart directly or check how these components are configured by default.

bacherfl commented 2 weeks ago

I tried to reproduce this and got the same behaviour when there is an error during the initialization of the kube client: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/c06be6df25ba21050827752f6e88b5054015a1a4/processor/k8sattributesprocessor/processor.go#L71

In this case the error is passed through to the componentstatus.ReportStatus() function, but does not end up in the logs, which makes this scenario hard to troubleshoot. Therefore I think also logging the error using the processor's logger in addition to passing it on to the ReportStatus() would make sense as it would make it easier to spot any errors during the kube client initialization.

ChrsMark commented 2 weeks ago

I tried to reproduce this and got the same behaviour when there is an error during the initialization of the kube client:

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/c06be6df25ba21050827752f6e88b5054015a1a4/processor/k8sattributesprocessor/processor.go#L71

In this case the error is passed through to the componentstatus.ReportStatus() function, but does not end up in the logs, which makes this scenario hard to troubleshoot. Therefore I think also logging the error using the processor's logger in addition to passing it on to the ReportStatus() would make sense as it would make it easier to spot any errors during the kube client initialization.

That'd make sense!