Closed grcevski closed 7 months ago
After some digging this particular issue is caused by how Tempo parses client and server spans to generate the graphs. In absence of distributed tracing, the server and client spans are not going to match, so we have to rely on the PeerService
field in the traces to connect the graph.
The problem is that the server spans never get a ClientService
field set in the tempo graphs, so Tempo assigns "user" to the client, making the graph look like the above.
The only way to workaround the problem is to create a fake span that is not a server span, which makes Tempo skip the "user" assignment logic.
This has to do with lack of context propagation in some cases, the focus should be in increasing the ability to context propagate.
Testing Beyla on Kubernetes and seeing the same issue? Is there a workaround for this? Beyla docker image: 1.5.2 Opentelemetry collector version: 0.99.0 Grafana Tempo Distributed: 2.4.1
Beyla config:
beyla-config.yml: |
log_level: INFO
print_traces: false
attributes:
kubernetes:
enable: true
routes:
unmatched: heuristic
prometheus_export:
port: 8889
path: /metrics
internal_metrics:
port: 8889
ebpf:
bfp_debug: true
grafana:
oltp:
submit: ["metrics", "traces"]
otel_traces_export:
sampler:
name: "parentbased_always_on"
discovery:
services:
- exe_path: (apache2)|(node)
At the moment we can propagate context between services well for Go, and between services on the same node if a single Beyla is monitoring all services on that node. We'll be improving this in the next couple of months.
@grcevski any updateds on this?
I'm using beyla running as daemonsets, but I'm not able to see the downstreams at all, only upstream services
@brunocascio if you are using OpenTelemetry Collector, one workaround can be setting peer.service to server.address. Hackish, not perfect, but works
config:
receivers:
otlp:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
endpoint: ${env:MY_POD_IP}:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
transform:
trace_statements:
- context: span
statements:
- set(attributes["peer.service"], attributes["server.address"]) where attributes["peer.service"] == nil
@brunocascio if you are using OpenTelemetry Collector, one workaround can be setting peer.service to server.address. Hackish, not perfect, but works
config: receivers: otlp: protocols: grpc: endpoint: ${env:MY_POD_IP}:4317 http: endpoint: ${env:MY_POD_IP}:4318 processors: batch: timeout: 10s send_batch_size: 1024 transform: trace_statements: - context: span statements: - set(attributes["peer.service"], attributes["server.address"]) where attributes["peer.service"] == nil
Thanks @sergeij ! I'm using OTEL but through Alloy, so I'will try this and come back soon!
Thanks!
@grcevski any updateds on this?
I'm using beyla running as daemonsets, but I'm not able to see the downstreams at all, only upstream services
We are actively working on making this happen at the moment. We have some changes in main, but they are not enabled yet. More will land in the next couple of weeks.
The NGINX example with 3 downstream services shows wrong tempo service graph. Instead of NGINX being the in between the User and the 3 upstream services, it generates as a sibling instead.