Open clintonb opened 1 month ago
Pinging code owners:
processor/k8sattributes: @dmitryax @fatsheep9146 @TylerHelmuth
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Can you provide more details on how you configured the OTEL collector? Additionally, why did you commented the pod_association block in the configuration? I believe that without pod_association, the k8sattributes processor will not function correctly.
@vkamlesh I've posted the smallest configuration I have to replicate the issue. The logs are everything I have. k8sattributes
doesn't seem to log anything, even at the debug level.
The collector is broken even if I restore all commented-out code. This is when I run both locally and on Kubernetes.
I don't think anything in my Dockerfile should affect this processor, but here it is for completeness:
# Adapted from:
# - https://www.honeycomb.io/blog/rescue-struggling-pods-from-scratch
# - https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/cmd/otelcontribcol/Dockerfile
FROM otel/opentelemetry-collector-contrib:0.111.0 AS binary
FROM alpine:latest
ARG USER_UID=10001
USER ${USER_UID}
COPY --from=binary /otelcol-contrib /
EXPOSE 4317 4318 55680 55679
COPY config.yaml /etc/otelcol/config.yaml
ENV LOG_LEVEL=info
ARG COMMIT_SHA=""
ENV COMMIT_SHA=${COMMIT_SHA}
# Remove the entrypoint so we can execute other commands for hooks and other purposes.
ENTRYPOINT []
CMD ["/otelcol-contrib", "--config", "/etc/otelcol/config.yaml"]
For k8sattributes, I think you need to un-comment pod_association section. For example
k8sattributes/logs: #Extracting Kubernetes attributes from resource metadata.
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
- k8s.cluster.uid
- k8s.container.name
- container.image.name
- container.image.tag
- k8s.cluster.uid
filter:
node_from_env_var: K8S_NODE_NAME
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: resource_attribute
name: container.id
- sources:
- from: connection
@vkamlesh I tried that and it doesn't work.
config.yaml
extensions:
health_check:
endpoint: "0.0.0.0:13133"
receivers:
otlp:
protocols:
http:
endpoint: "0.0.0.0:4318"
include_metadata: true
processors:
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
- k8s.cluster.uid
- k8s.container.name
- container.image.name
- container.image.tag
- k8s.cluster.uid
filter:
node_from_env_var: K8S_NODE_NAME
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: resource_attribute
name: container.id
- sources:
- from: connection
exporters:
# NOTE: Add this to the list of pipeline exporters to see the collector's debug logs
debug:
verbosity: detailed
service:
extensions: [ health_check ]
# https://opentelemetry.io/docs/collector/configuration/#telemetry
telemetry:
# This controls log verbosity of the collector itself.
logs:
encoding: json
level: "debug"
pipelines:
traces:
receivers: [ otlp ]
processors: [ k8sattributes ]
exporters: [ debug ]
collector logs
otel-collector-1 | {"level":"info","ts":1729696350.0262184,"caller":"service@v0.111.0/service.go:136","msg":"Setting up own telemetry..."}
otel-collector-1 | {"level":"info","ts":1729696350.0269117,"caller":"telemetry/metrics.go:70","msg":"Serving metrics","address":"localhost:8888","metrics level":"Normal"}
otel-collector-1 | {"level":"info","ts":1729696350.0271108,"caller":"builders/builders.go:26","msg":"Development component. May change in the future.","kind":"exporter","data_type":"traces","name":"debug"}
otel-collector-1 | {"level":"debug","ts":1729696350.0287833,"caller":"builders/builders.go:24","msg":"Beta component. May change in the future.","kind":"processor","name":"k8sattributes","pipeline":"traces"}
otel-collector-1 | {"level":"debug","ts":1729696350.0288186,"caller":"builders/builders.go:24","msg":"Stable component.","kind":"receiver","name":"otlp","data_type":"traces"}
otel-collector-1 | {"level":"debug","ts":1729696350.028841,"caller":"builders/extension.go:48","msg":"Beta component. May change in the future.","kind":"extension","name":"health_check"}
otel-collector-1 | {"level":"info","ts":1729696350.029432,"caller":"service@v0.111.0/service.go:208","msg":"Starting otelcol-contrib...","Version":"0.111.0","NumCPU":16}
otel-collector-1 | {"level":"info","ts":1729696350.0294414,"caller":"extensions/extensions.go:39","msg":"Starting extensions..."}
otel-collector-1 | {"level":"info","ts":1729696350.0296128,"caller":"extensions/extensions.go:42","msg":"Extension is starting...","kind":"extension","name":"health_check"}
otel-collector-1 | {"level":"info","ts":1729696350.029626,"caller":"healthcheckextension@v0.111.0/healthcheckextension.go:33","msg":"Starting health_check extension","kind":"extension","name":"health_check","config":{"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
otel-collector-1 | {"level":"warn","ts":1729696350.031119,"caller":"internal@v0.111.0/warning.go:40","msg":"Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.","kind":"extension","name":"health_check","documentation":"https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
otel-collector-1 | {"level":"info","ts":1729696350.0315917,"caller":"extensions/extensions.go:59","msg":"Extension started.","kind":"extension","name":"health_check"}
Health check response
curl -v http://localhost:13133
* Host localhost:13133 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
* Trying [::1]:13133...
* Connected to localhost (::1) port 13133
> GET / HTTP/1.1
> Host: localhost:13133
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 503 Service Unavailable
< Content-Type: application/json
< Date: Wed, 23 Oct 2024 15:14:10 GMT
< Content-Length: 78
<
* Connection #0 to host localhost left intact
{"status":"Server not available","upSince":"0001-01-01T00:00:00Z","uptime":""}%
I don't think that's an issue with the processor/k8sattributes
component. It looks like an issue with how the otlp
receiver and/or the health_check
extension. I would speculate the issue lies on how these are configured and specifically the endpoint
part.
If you are running the Collector on K8s I would advice to take a look into https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector and either use the Helm Chart directly or check how these components are configured by default.
I tried to reproduce this and got the same behaviour when there is an error during the initialization of the kube client: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/c06be6df25ba21050827752f6e88b5054015a1a4/processor/k8sattributesprocessor/processor.go#L71
In this case the error is passed through to the componentstatus.ReportStatus()
function, but does not end up in the logs, which makes this scenario hard to troubleshoot.
Therefore I think also logging the error using the processor's logger in addition to passing it on to the ReportStatus()
would make sense as it would make it easier to spot any errors during the kube client initialization.
I tried to reproduce this and got the same behaviour when there is an error during the initialization of the kube client:
In this case the error is passed through to the
componentstatus.ReportStatus()
function, but does not end up in the logs, which makes this scenario hard to troubleshoot. Therefore I think also logging the error using the processor's logger in addition to passing it on to theReportStatus()
would make sense as it would make it easier to spot any errors during the kube client initialization.
That'd make sense!
Component(s)
processor/k8sattributes
What happened?
Description
I am trying to add
k8sattributes
to a gateway collector, but the collector and health check are not functioning. The collector appears to start, but refuses connections on the receiving ports. The health check endpoint returns a 503 with{"status":"Server not available","upSince":"0001-01-01T00:00:00Z","uptime":""}
.Steps to Reproduce
Expected Result
curl -v http://localhost:13133
.Actual Result
The collector never becomes healthy, and does not accept any signals.
Collector version
0.111.0
Environment information
Environment
OS: macOS 15.0.1 (Docker), and GKE Autopilot
OpenTelemetry Collector configuration
Log output