open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.15k stars 412 forks source link

Go service auto instrumentation creates empty/all zero span_id and trace_id resulting in OTel collector error #1858

Closed dhanvsagar closed 1 year ago

dhanvsagar commented 1 year ago

Describe the bug

Hi, I am have a demo app (https://github.com/GoogleCloudPlatform/microservices-demo) deployed in a k8s cluster. The services are in Java, NodeJS, .Net and Go.

I have the open telemetry operator installed in my cluster with featureGates: "operator.autoinstrumentation.go" to enable support for Go.

The app YAML files contains annotations as follows

 annotations:
        instrumentation.opentelemetry.io/inject-go: "true"
        instrumentation.opentelemetry.io/otel-go-auto-target-exe: "/src/server"

My configured trace flow is as follows

[App + Auto Instrumentation] ---> OTel collector deployment --> Tempo --> Grafana

collector manifest as follows

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: demo
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s
    exporters:
      logging:
      otlp:
        endpoint: 'demo-tempo:4317'
        tls:
          insecure: true

    service:
      telemetry:
        logs:
          level: "debug"
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [otlp]

Instrumentation.yaml

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: demo-instrumentation
spec:
  exporter:
    endpoint: http://demo-collector:4317
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "0.25"
  python:
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://demo-collector:4318
  dotnet:
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://demo-collector:4318

Environment

opentelemetry-operator-0.31.0
App Version 0.78.0

To Reproduce

Steps to reproduce the behavior:

  1. Create the OTel instrumentation and collector
  2. Deploy the microservice
  3. Check for errors

Expected behavior

Traces must be flowing correctly to the collector and to tempo.

Actual behavior

I can see the auto-instrumentation sidecar gets created and

Update

Further looking into this, found that when I access the service, the spanID and traceID are generated as zeros/empty.

{"level":"info","ts":1687285165.9649785,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.9650521,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.966659,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.966715,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.9694097,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.9694622,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.9704826,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.9705439,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.971475,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.9715357,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.9724376,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.9724922,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.9733493,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.9734104,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.9742744,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.9743166,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.9757655,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1687285165.97583,"caller":"opentelemetry/controller.go:74","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1687285165.9776042,"caller":"grpc/probe.go:226","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}

The OTel collector is unable to export the traces to tempo, and throws the error {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "Permanent error: rpc error: code = InvalidArgument desc = trace ids must be 128 bit", "dropped_items": 6}

Error log from the Otel Collector

2023-06-20T13:30:25.751Z    error    exporterhelper/queued_retry.go:391    Exporting failed. The error is not retryable. Dropping data.    {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "Permanent error: rpc error: code = InvalidArgument desc = trace ids must be 128 bit", "dropped_items": 6}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
    go.opentelemetry.io/collector/exporter@v0.78.2/exporterhelper/queued_retry.go:391
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send
    go.opentelemetry.io/collector/exporter@v0.78.2/exporterhelper/traces.go:126
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
    go.opentelemetry.io/collector/exporter@v0.78.2/exporterhelper/queued_retry.go:195
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
    go.opentelemetry.io/collector/exporter@v0.78.2/exporterhelper/internal/bounded_memory_queue.go:47
2023-06-20T13:30:35.753Z    error    exporterhelper/queued_retry.go:391    Exporting failed. The error is not retryable. Dropping data.    {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "Permanent error: rpc error: code = InvalidArgument desc = trace ids must be 128 bit", "dropped_items": 6}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
    go.opentelemetry.io/collector/exporter@v0.78.2/exporterhelper/queued_retry.go:391
go.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send
    go.opentelemetry.io/collector/exporter@v0.78.2/exporterhelper/traces.go:126
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
    go.opentelemetry.io/collector/exporter@v0.78.2/exporterhelper/queued_retry.go:195
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
    go.opentelemetry.io/collector/exporter@v0.78.2/exporterhelper/internal/bounded_memory_queue.go:47

Error log from Tempo

level=error ts=2023-06-20T13:34:45.788034834Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="rpc error: code = InvalidArgument desc = trace ids must be 128 bit"
level=error ts=2023-06-20T13:34:55.789838072Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="rpc error: code = InvalidArgument desc = trace ids must be 128 bit"
level=error ts=2023-06-20T13:35:05.791289645Z caller=rate_limited_logger.go:27 msg="pusher failed to consume trace data" err="rpc error: code = InvalidArgument desc = trace ids must be 128 bit"

Please note that this error is only when I enable auto-instrumentation for the Go services and I don't see this error for NodeJS, Python and Java services.

Could you please help to resolve the issue.

TylerHelmuth commented 1 year ago

@dhanvsagar thanks for reporting this! I've seen this happen before and I believe it is an issue with the Go auto-instrumentation agent and http. I see you've opened an issue for this in the Go Instrumentation repository as well so I will close this for now. Please ping me if something changes and the operator becomes the suspect.