grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
4.05k stars 523 forks source link

Enhancing Response Speed for Queries with No Data #4258

Open JunhoeKim opened 1 month ago

JunhoeKim commented 1 month ago

We are managing our dashboard using a predetermined TraceQL. For example, when sending a query with the condition { span.http.status_code = 500 } over a lengthy period of 2 days, we have noticed it takes an excessively long time if no traces meet this condition. Here are the response times we've observed:

30-minute period: 6 seconds
2-day period: 26 seconds

For traces that do exist, we achieve response times between 500ms and 700ms. How can we quickly identify the absence of data and improve the response time in such cases?

For reference, we are continuously monitoring the resources of our Tempo components, and we have observed that executing such queries fully utilizes the CPU of all 18 queriers, each with 8 vCPUs.

joe-elliott commented 2 weeks ago

Many things can contribute to long running queries. I think something that will help an initial understanding is sharing the lines in the query-frontend with "search response" in them.

https://github.com/grafana/tempo/blob/main/modules/frontend/search_handlers.go#L169

This contains a lot of stats showing the number of bytes, blocks, jobs processed etc.