Open JunhoeKim opened 1 month ago
Many things can contribute to long running queries. I think something that will help an initial understanding is sharing the lines in the query-frontend with "search response" in them.
https://github.com/grafana/tempo/blob/main/modules/frontend/search_handlers.go#L169
This contains a lot of stats showing the number of bytes, blocks, jobs processed etc.
We are managing our dashboard using a predetermined TraceQL. For example, when sending a query with the condition
{ span.http.status_code = 500 }
over a lengthy period of 2 days, we have noticed it takes an excessively long time if no traces meet this condition. Here are the response times we've observed:For traces that do exist, we achieve response times between 500ms and 700ms. How can we quickly identify the absence of data and improve the response time in such cases?
For reference, we are continuously monitoring the resources of our Tempo components, and we have observed that executing such queries fully utilizes the CPU of all 18 queriers, each with 8 vCPUs.