grafana / beyla

eBPF-based autoinstrumentation of web applications and network metrics
https://grafana.com/oss/beyla-ebpf/
Apache License 2.0
1.45k stars 102 forks source link

TraceID does not match with TraceID inside the trace header #845

Open jpicara opened 6 months ago

jpicara commented 6 months ago

Hello!, I would like to report the following issue we are suffering. We have deployed Beyla, Agent and Tempo in our testing environment and from what I have observed, we are not able to correlate the traceID set into the trace headers with the traceID got from Beyla itself. As an example.

  grafana-beyla:
    enabled: true
    global:
      image:
        registry: "docker.io"
    image:
      repository: "grafana/beyla"
      pullPolicy: "IfNotPresent"
      tag: "1.5.0"
    rbac:
      create: true
    serviceAccount:
      create: true
    config:
      create: true
      name: ""
      data:
        routes:
          unmatched: heuristic
        log_level: debug
        otel_traces_export:
          endpoint: http://tracing-platform-grafana-agent.platform.svc:4318
        attributes:
          kubernetes:
            enable: true
        discovery:
          services:
            - exe_path: .*
    service:
    # -- whether to create a service for internal metrics
      enabled: true
      type: "ClusterIP"
      port: 80
      targetPort: 9090
  ## Env variables will override configmap values
    env:
      BEYLA_PRINT_TRACES: "false"
      OTEL_EXPORTER_OTLP_ENDPOINT: "http://tracing-platform-grafana-agent.platform.svc:4318"
      BEYLA_KUBE_METADATA_ENABLE: "true"

Doing a simple curl to specific endpoint I can get the headers

< HTTP/1.1 200 OK
< X-Trace-Token: 97ca3f513c0181da5de66c2b46daa4b9
< X-B3-TraceId: 97ca3f513c0181da5de66c2b46daa4b9

image

However the same trace in Beyla is marked with some random? traceID 147c58703a0a4df9e052bf86d5026785. Is there any way to patch the traceID with our X-Trace-Token or X-B3-TraceId? In addition to that, this traceID and SpanID set by beyla is somehow random? Thanks in advance!

jpicara commented 6 months ago

Indeed trace context propagation is supported as exposed.

time=2024-05-17T10:42:09.377Z level=DEBUG msg="Linux kernel version" component=nethttp.Tracer major=5 minor=10
time=2024-05-17T10:42:09.377Z level=DEBUG msg="checking kernel lockdown mode, [none] allows us to propagate trace context" component=ebpf.ProcessTracer
time=2024-05-17T10:42:09.377Z level=DEBUG msg="can't find /sys/kernel/security/lockdown, assuming no lockdown" component=ebpf.ProcessTracer
time=2024-05-17T10:42:09.377Z level=DEBUG msg="Kernel not in lockdown mode, trace context propagation is supported." component=nethttp.Tracer
time=2024-05-17T10:42:09.377Z level=DEBUG msg="Linux kernel version" component=grpc.Tracer major=5 minor=10
time=2024-05-17T10:42:09.377Z level=DEBUG msg="checking kernel lockdown mode, [none] allows us to propagate trace context" component=ebpf.ProcessTracer
time=2024-05-17T10:42:09.377Z level=DEBUG msg="can't find /sys/kernel/security/lockdown, assuming no lockdown" component=ebpf.ProcessTracer
time=2024-05-17T10:42:09.377Z level=DEBUG msg="Kernel not in lockdown mode, trace context propagation is supported." component=grpc.Tracer
grcevski commented 6 months ago

Hi @jpicara,

There are some limitations in what Beyla can do related to trace ID propagation, some of which will be resolved in the next couple of releases, but let me first ask couple of questions and explain better.

  1. What programming language is your application written in? Our current support for trace ID propagation works well for Go and only on a single node for other languages. The details can be found here: https://grafana.com/blog/2024/03/21/opentelemetry-distributed-tracing-with-ebpf-whats-new-in-grafana-beyla-1.3/. This is the current state of things and will improve significantly soon.
  2. We only parse and propagate the W3C standard Trace ID format, which uses Traceparent for the header information. https://www.w3.org/TR/trace-context/. We don't support other or custom traceheaders at the moment, and the format must match what Traceparent has. We could add parsing of other headers in the future, but that would be a feature request. When Beyla doesn't find the Traceparent field in the headers, it generates a new one, which is why you see random IDs.
  3. If you are able to copy what you have in your X-Trace-Token header to Traceparent and make sure the format matches what W3C says about the field, Beyla will pick it up. If your application is written Go then this is automatic, for other languages this will require that you enable the parsing of request headers: https://grafana.com/docs/beyla/latest/configure/options/#ebpf-tracer BEYLA_BPF_TRACK_REQUEST_HEADERS. The reason we need this option separately enabled is because it might add some overhead, which is now much reduced than our original support, but it requires Linux kernel 5.17. I see that you are running on 5.10, so this may now work well for you if you are not using Go for the applications.

Please let me know if you have more questions, happy to help!

jpicara commented 6 months ago

Hey, Wo, thanks for your detailed response. Much appreciated!

  1. Application tested is written mainly in Java so this is an issue at all. However, it is not the only one cause we have as many beyla pods as nodes in our environment which in this test case are 6. So this is the second issue then.
  2. Pretty well explanation. Then this is the reason why it creates a new traceID totally different of what we have in our headers.
  3. I will test that env variable to see it I am able to make it working. However based on the point 1 and in our amount of nodes we have in our setup, we would have an issue with tracing propagation. Thanks again for your clarification and waiting for new exciting releases!