Closed nerdvegas closed 3 weeks ago
Further info.
The {} query appears to be picking up cases where there are unnamed spans, eg:
{
"traceID": "5c44da17ce4a17ea7ae25735637d49ad",
"rootServiceName": REDACTED",
"rootTraceName": "REDACTED"",
"startTimeUnixNano": "1718761370617897594",
"spanSet": {
"spans": [
{
"spanID": "d11758357946fe59",
"startTimeUnixNano": "1718761370617911249",
"durationNanos": "29105"
},
{
"spanID": "f5e55735c11b75cd",
"startTimeUnixNano": "1718761370617898485",
"durationNanos": "42229"
},
{
"spanID": "7ae25735637d49ad",
"startTimeUnixNano": "1718761370617897594",
"durationNanos": "45445"
}
],
"matched": 3
},
However, these extra unnamed spans are nowhere to be found in the otelcol debug
exporter output.
{}
should be a superset of {name~".+"}
right? I have limit
set to max (100,000).
I also can't figure out why spans are sometimes unnamed, or what the difference is between spanSet
and spanSets
returned in the response. https://grafana.com/docs/tempo/latest/api_docs/#search has no mention of either of these.
More:
After reading the following, I added start/end to make sure traceQL is pulling from the backend in all cases:
end = (unix epoch seconds) Optional. Along with start, define a time range from which traces should be returned. Providing both start and end changes the way that Tempo searches. If the parameters aren’t provided, then Tempo searches the recent trace data stored in the ingesters. If the parameters are provided, it searches the backend as well.
However the results are the same - {} returns a large number of unnamed spans (perhaps not unexpected), but {name~".+"}
is still not a subset (it contains spans not returned by {}) - despite limit
being set in both cases, and the total number of traces and spans being well under 100,000.
A quick attempt internally is not reproducing this issue. Over a 5 minute period on a low volume test tenant these two queries return the exact same spans.
Can you share the spans that are returned by {name~".+"}
that are not returned by {}
? I assume we are querying the exact same historical time range every time? Are the results consistent?
I also can't figure out why spans are sometimes unnamed
Tempo will not return the name
unless you request it. {} | select(name)
should return all spans with their names.
spanSet and spanSets
Originally we only had spanSet
but we made an API change and currently populate both spanSet
and spanSets
b/c older version of Grafana still use spanSet
. Ignore spanSet
and only parse spanSets
.
Ignore spanSet and only parse spanSets.
good info thanks
Are the results consistent?
Yes
Can you share the spans that are returned by {name~".+"} that are not returned by {}?
There are a few 100 span names, but I've been looking at one specifically because it shows a large discrepancy. In the {}
query I get 102 traces containing that span; in the {name~".+"}
query I get just over 500. In the latter query, rootTraceName
is set to the span in question.
Would it help if I can give you a dump of the data? I have /data and /tmp/tempo bind mounted when I launch the grafana/otel-lgtm container, so I can resurrect the same grafana session later. Size is approx 200Kb as a tar.gz. I would have to check with my employer first though.
An info dump would be helpful. If you are recreating this with a load generator and a simple set of steps that would work too.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.
Describe the bug Using Tempo v2.4.2 via otel-lgtm container. Running query '{}' returns significantly less spans than '{name=~".+"}'. Docs don't mention this - am completely mystified. First noticed when the set of span names logged by otel collector when adding a 'debug' exporter, didn't match results we were getting from '{}' query.
To Reproduce Steps to reproduce the behavior:
Expected behavior Results should be the same.
Environment: Using Tempo v2.4.2 via otel-lgtm container.