Closed marianobrc closed 11 months ago
Thanks for the details @marianobrc.
if there is a minimal amount of traces required to start getting metrics
No, there shouldn't be, just a couple are enough to start showing some metrics in the Monitor tab.
What span kind is your python application emitting? If it's anything but the server
kind, then that could explain why you're not able to view metrics from your application. See also: https://github.com/jaegertracing/jaeger/pull/3898.
There's a otlp_exporter_example.py
script in the demo environment that should produce metrics in the Monitor tab for you (might need to wait a few seconds after sending a trace)? Maybe you can use this as an example to compare with re: instrumenting your Python application.
Thanks for you prompt response @albertteoh .
My application uses spans of kind producer
and consumer
, not server
indeed.
I'm tracing a distributed transactions in an event-driven architecture using kafka as the message broker for inter-service communication. The goal is to get metrics like the total latency (duration) of the transaction trace
. I can see the total duration of each individual trace related to one of this transactions in the trace details, and I was hoping to see come latency metric from that trace duration in the monitor tab.
I tried changing the span kind to server and I see metrics now, but it doesn't feel like the right solution. It doesn't make sense to force a span within a producer or consumer to be of server kind just to get the metrics. Also, I see that the latency metric is calculated at the spans level instead at trace level, so I can see at most the latency of some span within one service. I guess it makes sense as this is for "Service Monitoring", so my use case isn't supported.
From what I gather, I think there are two problems that you're currently facing:
I agree that forcing the server
span kind into your producer/consumer applications isn't the right approach.
In this case, I think it's just a limitation of the Jaeger UI at the moment, which currently hardcodes the kind to server
(for expediency at the time of development). The Jaeger Query API itself supports querying for any span kind. I'm not sure, however, if it's a good idea to aggregate across all span kinds as that would "double" count the metrics if, say, a service is both a server
and a producer
.
Would https://github.com/jaegertracing/jaeger/pull/3898#issuecomment-1305157851 address your need to view metrics from producer/consumer spans (i.e. a dropdown in Jaeger UI's Monitor tab to select the span kind, and perhaps cache that selection per service)?
SPM was indeed designed to provide service-level monitoring, and so is completely oblivious of the concept of traces, which has the added benefit of simplifying the design.
If I understand your requirement correctly, you'd like to view the latency of handling a single transaction in an event-driven architecture.
I don't have any experience in this space and, in SPM, I would usually suggest to go to the "root" service to find this information. However, my naïve understanding is that this may not be possible for event-driven architectures as the producer would simply return after sending the message payload to Kafka, and the request is asynchronously consumed so the "root" span doesn't encompass the entire handling of the transaction. What you'd essentially want is a way to aggregate the latencies across all spans in a trace.
If the above is correct, I agree that SPM would not support the use case of measuring the latency of a single transaction. However, I'm definitely open to ideas/suggestions!
What happened?
I'm trying the experimental SPM features, using the all-in-one docker image and docker-compose with Prometheus as described in the docs. And I want to see the latency and error rate metrics for my services, but I only see empty metrics.
Steps to reproduce
I'm sending traces both with the simulator and from my own services instrumented with OpenTelemetry and both are collected and shown properly. But in the monitoring tab I can only see metrics for the data generated by the simulator, but not for my services. I see my services in the dropdown but when I select one it says that no data is available.
I checked in prometheus and the metrics are there, and I can also visualize it with Graphana, but still can't see it in the jaeger monitoring tab. When I call teh http API or if I inspect the requests being made by the frontend I can see the metrics api returns an empty list for my services.
Expected behavior
I'm wondering if there is a minimal amount of traces required to start getting metrics or I'm missing something?
Relevant log output
No response
Screenshot
No response
Additional context
No response
Jaeger backend version
v1.39.0
SDK
Open Telemetry Python SDK using OTLPSpanExporter
Pipeline
OTLPSpanExporter -> OTel Collector -> Prometheus -> Jaeger All-in-one
Stogage backend
Prometheus
Operating system
Ubuntu
Deployment model
docker-compose
Deployment configs