open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.71k stars 2.14k forks source link

TransformProcessor doesn't set TraceID and SpanID within LogContext #32190

Open AlessandroFazio opened 2 months ago

AlessandroFazio commented 2 months ago

Component(s)

processor/transform

What happened?

Description

Hello, I have have setup a pipeline with the transform processor to extract TraceID and SpanId data from traceparent http header and set the trace_id and span_id in the LogContext.

     processors:
      transform/envoyal:
        error_mode: ignore
        log_statements:
          - context: log
            statements:
              - set(attributes["trace-id"], Substring(attributes["traceparent"], 3, 16))
              - set(attributes["parent-id"], Substring(attributes["traceparent"], 20, 8))
              - set(trace_id.string, attributes["trace-id"])
              - set(span_id.string, attributes["parent-id"])
              # - delete_key(attributes, "traceparent")
              # - delete_key(attributes, "trace-id")
              # - delete_key(attributes, "trace-id")

The key here is:

  - set(trace_id.string, attributes["trace-id"])
  - set(span_id.string, attributes["parent-id"])

Steps to Reproduce

Expected Result

This is the payload I expect:

LogRecord #0
ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-04-05 14:38:49.070989 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str([2024-04-05T14:38:49.070Z] \"GET /api/v1/call-me HTTP/1.1\" 200 - via_upstream - \"-\" 0 55 2 2 \"-\" \"Java/1.8.0_402\" \"a64cd4db-7988-931a-a0d3-2ebde5a7735c\" \"inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc:8080\" \"10.244.0.17:8080\" outbound|8080||inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local 10.244.0.16:37036 10.96.67.20:8080 10.244.0.16:59646 - default
)
Attributes:
     -> x_forwarded_for: Str(-)
     -> authority: Str(inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc:8080)
     -> bytes_received: Str(0)
     -> bytes_sent: Str(55)
     -> downstream_local_address: Str(10.96.67.20:8080)
     -> downstream_remote_address: Str(10.244.0.16:59646)
     -> duration: Str(2)
     -> dyn_meta: Str(-)
     -> method: Str(GET)
     -> path: Str(/api/v1/call-me)
     -> user_agent: Str(Java/1.8.0_402)
     -> protocol: Str(HTTP/1.1)
     -> requested_server_name: Str(-)
     -> response_code: Str(200)
     -> response_flags: Str(-)
     -> route_name: Str(default)
     -> services: Str([{\"host\":\"inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local\",\"namespace\":\"esol-ap12345-otel-poc1-dev-namespace\",\"name\":\"inventory-service\"}])
     -> start_time: Str(2024-04-05T14:38:49.070Z)
     -> traceparent: Str(00-1e2c3bff3f92d79cf8d1140f6af32f69-c784b919dc6de35e-01)
     -> upstream_cluster: Str(outbound|8080||inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local)
     -> upstream_host: Str(10.244.0.17:8080)
     -> upstream_local_address: Str(10.244.0.16:37036)
     -> upstream_service_time: Str(2)
     -> upstream_transport_failure_reason: Str(-)
     -> trace-id: Str(1e2c3bff3f92d79c)
     -> parent-id: Str(8d1140f6)
Trace ID: Str(1e2c3bff3f92d79c)
Span ID: Str(8d1140f6)
Flags: 0

Actual Result

This is the payload I expect:

LogRecord #0
ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-04-05 14:38:49.070989 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str([2024-04-05T14:38:49.070Z] \"GET /api/v1/call-me HTTP/1.1\" 200 - via_upstream - \"-\" 0 55 2 2 \"-\" \"Java/1.8.0_402\" \"a64cd4db-7988-931a-a0d3-2ebde5a7735c\" \"inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc:8080\" \"10.244.0.17:8080\" outbound|8080||inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local 10.244.0.16:37036 10.96.67.20:8080 10.244.0.16:59646 - default
)
Attributes:
     -> x_forwarded_for: Str(-)
     -> authority: Str(inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc:8080)
     -> bytes_received: Str(0)
     -> bytes_sent: Str(55)
     -> downstream_local_address: Str(10.96.67.20:8080)
     -> downstream_remote_address: Str(10.244.0.16:59646)
     -> duration: Str(2)
     -> dyn_meta: Str(-)
     -> method: Str(GET)
     -> path: Str(/api/v1/call-me)
     -> user_agent: Str(Java/1.8.0_402)
     -> protocol: Str(HTTP/1.1)
     -> requested_server_name: Str(-)
     -> response_code: Str(200)
     -> response_flags: Str(-)
     -> route_name: Str(default)
     -> services: Str([{\"host\":\"inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local\",\"namespace\":\"esol-ap12345-otel-poc1-dev-namespace\",\"name\":\"inventory-service\"}])
     -> start_time: Str(2024-04-05T14:38:49.070Z)
     -> traceparent: Str(00-1e2c3bff3f92d79cf8d1140f6af32f69-c784b919dc6de35e-01)
     -> upstream_cluster: Str(outbound|8080||inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local)
     -> upstream_host: Str(10.244.0.17:8080)
     -> upstream_local_address: Str(10.244.0.16:37036)
     -> upstream_service_time: Str(2)
     -> upstream_transport_failure_reason: Str(-)
     -> trace-id: Str(1e2c3bff3f92d79c)
     -> parent-id: Str(8d1140f6)
Trace ID:
Span ID:
Flags: 0

Hope you can help me with this. Thanks in advance for your time.

Collector version

0.97.0

Environment information

Os: Ubuntu 22.04

OpenTelemetry Collector configuration

receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
      prometheus:
        config:
          scrape_configs:
            - job_name: 'otel-collector'
              scrape_interval: 5s
              static_configs:
                - targets: ['0.0.0.0:8888']
                  labels:
                    apm_code: 'ap00001'
                    project: 'otel-poc'
        target_allocator:
          endpoint: http://otel-targetallocator
          interval: 30s
          collector_id: "$${POD_NAME}"

    processors:
      batch/traces:
        timeout: 500ms
        send_batch_size: 100
      batch/logs:
        timeout: 500ms
        send_batch_size: 100
      batch/metrics:
        timeout: 500ms
        send_batch_size: 100
      groupbyattrs/istiometrics:
        keys:
          - apm_code
          - project
      groupbyattrs/envoyal:
        keys:
          - apm_code
          - project
      transform/envoyal:
        error_mode: ignore
        log_statements:
          - context: log
            statements:
              - set(attributes["trace-id"], Substring(attributes["traceparent"], 3, 16))
              - set(attributes["parent-id"], Substring(attributes["traceparent"], 20, 8))
              - set(trace_id.string, attributes["trace-id"])
              - set(span_id.string, attributes["parent-id"])
              # - delete_key(attributes, "traceparent")
              # - delete_key(attributes, "trace-id")
              # - delete_key(attributes, "trace-id")

    exporters:
      logging:
        loglevel: debug
      otlphttp:
        traces_endpoint: "url"
        metrics_endpoint: "uel"
        logs_endpoint: "url"
        auth:
          authenticator: sigv4auth
        tls:
          insecure: true
        compression: none
        retry_on_failure:
          initial_interval: 10s
          max_interval: 30s
          max_elapsed_time: 1200s # (in secs) set to 0 for infinite retry
        sending_queue:
          storage: file_storage/otc
          queue_size: 5000
          num_consumers: 10
        timeout: 10s

    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
        path: "/health"
      zpages:
        endpoint: ":55679"
      sigv4auth:
        assume_role:
          arn: "role_arn"
          sts_region: "eu-central-1"
        region: "eu-central-1"
        service: "osis"
      file_storage/otc:
        directory: /var/lib/otelcol/buffers
        compaction:
          on_start: true
          on_rebound: true
          directory: /tmp/otel
          max_transaction_size: 65_536
          rebound_needed_threshold_mib: 100
          rebound_trigger_threshold_mib: 10

    service:
      extensions: [health_check, zpages, sigv4auth, file_storage/otc]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch/traces]
          exporters: [otlphttp]
        logs:
          receivers: [otlp]
          processors: [groupbyattrs/envoyal,transform/envoyal] # batch/logs
          exporters: [otlphttp] # logging
        metrics:
          receivers: [otlp,prometheus]
          processors: [batch/metrics,groupbyattrs/istiometrics]
          exporters: [otlphttp]
      telemetry:
        logs:
          level: "DEBUG"
          development: true
          encoding: "json"

Log output

{"level":"error","ts":1712330461.6687455,"caller":"logs/processor.go:54","msg":"failed processing logs","kind":"processor","name":"transform/envoyal","pipeline":"logs","error":"failed to execute statement: set(trace_id.string, attributes[\"trace-id\"]), trace ids must be 32 hex characters","stacktrace":"github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor/internal/logs.(Processor).ProcessLogs\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor@v0.97.0/internal/logs/processor.go:54\ngo.opentelemetry.io/collector/processor/processorhelper.NewLogsProcessor.func1\n\tgo.opentelemetry.io/collector/processor@v0.97.0/processorhelper/logs.go:48\ngo.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs\n\tgo.opentelemetry.io/collector/consumer@v0.97.0/logs.go:25\ngo.opentelemetry.io/collector/processor/processorhelper.NewLogsProcessor.func1\n\tgo.opentelemetry.io/collector/processor@v0.97.0/processorhelper/logs.go:56\ngo.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs\n\tgo.opentelemetry.io/collector/consumer@v0.97.0/logs.go:25\ngo.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs\n\tgo.opentelemetry.io/collector/consumer@v0.97.0/logs.go:25\ngo.opentelemetry.io/collector/internal/fanoutconsumer.(logsConsumer).ConsumeLogs\n\tgo.opentelemetry.io/collector@v0.97.0/internal/fanoutconsumer/logs.go:62\ngo.opentelemetry.io/collector/receiver/otlpreceiver/internal/logs.(Receiver).Export\n\tgo.opentelemetry.io/collector/receiver/otlpreceiver@v0.97.0/internal/logs/otlp.go:41\ngo.opentelemetry.io/collector/pdata/plog/plogotlp.rawLogsServer.Export\n\tgo.opentelemetry.io/collector/pdata@v1.4.0/plog/plogotlp/grpc.go:88\ngo.opentelemetry.io/collector/pdata/internal/data/protogen/collector/logs/v1._LogsService_Export_Handler.func1\n\tgo.opentelemetry.io/collector/pdata@v1.4.0/internal/data/protogen/collector/logs/v1/logs_service.pb.go:311\ngo.opentelemetry.io/collector/config/configgrpc.(ServerConfig).toServerOption.enhanceWithClientInformation.func9\n\tgo.opentelemetry.io/collector/config/configgrpc@v0.97.0/configgrpc.go:398\ngo.opentelemetry.io/collector/pdata/internal/data/protogen/collector/logs/v1._LogsService_Export_Handler\n\tgo.opentelemetry.io/collector/pdata@v1.4.0/internal/data/protogen/collector/logs/v1/logs_service.pb.go:313\ngoogle.golang.org/grpc.(Server).processUnaryRPC\n\tgoogle.golang.org/grpc@v1.62.1/server.go:1386\ngoogle.golang.org/grpc.(Server).handleStream\n\tgoogle.golang.org/grpc@v1.62.1/server.go:1797\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/grpc@v1.62.1/server.go:1027"}

Additional context

github-actions[bot] commented 2 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

AlessandroFazio commented 2 months ago

After looking more in details w3c tracing protocol I figured out that the version part in the traceparent header value is just one byte, so assuming another byte is taken for '-', the substring function should start reading from byte at index 2 until 16 bytes are read. I tested with 2 as starting index instead of 3, but this results in the same error.

My guess is that the header traceparent itself when set as attribute by envoy access log service in the telemetry payload is represented with wrong encoding and so the error propagate downstream in data processing. I didn't find anything about hex encoding conversion in the util funcitons in OTTL.

TylerHelmuth commented 2 months ago

We have https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31929 to track a Hex converter, but I believe all you'll need to do is adjust your Substring command:

processors:
      transform/envoyal:
        error_mode: ignore
        log_statements:
          - context: log
            statements:
              - set(attributes["trace-id"], Substring(attributes["traceparent"], 3, 32))
              - set(attributes["parent-id"], Substring(attributes["traceparent"], 36, 16))
              - set(trace_id.string, attributes["trace-id"])
              - set(span_id.string, attributes["parent-id"])

That should capture 1e2c3bff3f92d79cf8d1140f6af32f69 as the trace ID and c784b919dc6de35e as the span ID.

github-actions[bot] commented 3 weeks ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.