grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
4.02k stars 521 forks source link

tempo search result missing span #4027

Open wolf666666 opened 2 months ago

wolf666666 commented 2 months ago

Describe the bug When I search trace with traceQL {resource.service.name="ABC"} from grafana, set the time range that i need, also set limit and spanlimit to a big number, the trace list result display correctly, but trace missing span, one trace should have 6 spans, i only get 5 span, one of the span name is download, which is not showing in the trace list result. When I search trace one by one via traceID from grafana, it show the all the 6 spans of one trace.

I also try to use the grpc search function, the result only include 5 spans, it missing the download span too

To Reproduce Steps to reproduce the behavior:

  1. Start Tempo2.4.2 with microservice mode
  2. search trace when trace have muti-spans

Expected behavior the search result can include all spans.

Environment:

Additional Context

joe-elliott commented 2 months ago

Tempo will only return the spans that match the TraceQL filter even if there are more spans in the trace. So this query only has 2 spans in the TraceQL search resultset (even though there are more in the trace) b/c only two match the query.

Image

wolf666666 commented 2 months ago

TraceQL filter

For me the TraceQL fileter is {resource.service.name="ABC"} do you mean that one of the my span have no resource attribute service.name="ABC"? when i search trace by traceID, the result shows that all the 6 spans have resource attribute servcice.name="ABC" when i search with {resource.service.name="ABC"}, one of the span did not show, i test with grafana and grpc api, both not work. so i am a little confused, three teammebers try to find the reason from source code, but still not get the root reason

wolf666666 commented 2 months ago

Tempo will only return the spans that match the TraceQL filter even if there are more spans in the trace. So this query only has 2 spans in the TraceQL search resultset (even though there are more in the trace) b/c only two match the query.

Image

@joe-elliott , thanks your reply, for my scenario, all the spans show that they have the attribute service.name="ABC" when i search by traceId, I am a little confused about that

joe-elliott commented 2 months ago

I will need more information to help you debug this issue. Perhaps screenshots of the traces/search results you are concerned about.

wolf666666 commented 2 months ago

I will need more information to help you debug this issue. Perhaps screenshots of the traces/search results you are concerned about.

Hi @joe-elliott , Here is search result captured from grafana : request: https://abc.com/api/datasources/proxy/uid/1234/api/search?q=%7Bresource.service.name%3D%22ABC%22%7D%7Cselect(name)&limit=20&spss=30&start=1723505579&end=1723507200 response: { "traces": [ { ...... }, { "traceID": "2b56d15244e3f8a5c2a5b53abc456b2c", "rootServiceName": "ABC", "rootTraceName": "ABC", "startTimeUnixNano": "1723505579249000000", "durationMs": 1540856, "spanSet": { "spans": [ { "spanID": "b213fa13600e65cd", "name": "ABC", "startTimeUnixNano": "1723505579249000000", "durationNanos": "1540856247409", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] }, { "spanID": "0d515572cb8f7a38", "name": "EndTask", "startTimeUnixNano": "1723507119401122321", "durationNanos": "703912259", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] }, { "spanID": "f41538dcfe9a9798", "name": "Installation", "startTimeUnixNano": "1723505582074085096", "durationNanos": "873113290701", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] }, { "spanID": "b1af8faeda40e84e", "name": "Configuration", "startTimeUnixNano": "1723506455189141643", "durationNanos": "662879357896", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] }, { "spanID": "f9fab371dd024bf6", "name": "Cleanup", "startTimeUnixNano": "1723507118068942286", "durationNanos": "763534413", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] } ], "matched": 5 }, "spanSets": [ { "spans": [ { "spanID": "b213fa13600e65cd", "name": "ABC", "startTimeUnixNano": "1723505579249000000", "durationNanos": "1540856247409", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] }, { "spanID": "0d515572cb8f7a38", "name": "EndTask", "startTimeUnixNano": "1723507119401122321", "durationNanos": "703912259", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] }, { "spanID": "f41538dcfe9a9798", "name": "Installation", "startTimeUnixNano": "1723505582074085096", "durationNanos": "873113290701", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] }, { "spanID": "b1af8faeda40e84e", "name": "Configuration", "startTimeUnixNano": "1723506455189141643", "durationNanos": "662879357896", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] }, { "spanID": "f9fab371dd024bf6", "name": "Cleanup", "startTimeUnixNano": "1723507118068942286", "durationNanos": "763534413", "attributes": [ { "key": "service.name", "value": { "stringValue": "ABC" } } ] } ], "matched": 5 } ] }, { ...... } ], "metrics": { "inspectedBytes": "48603", "totalBlocks": 2, "completedJobs": 2, "totalJobs": 2, "totalBlockBytes": "115623" } }

Here is the result searched by treaceId: { "results": { "A": { "status": 200, "frames": [ { "schema": { "name": "Trace", "refId": "A", "meta": { "typeVersion": [ 0, 0 ], "preferredVisualisationType": "trace" }, "fields": [ { "name": "traceID", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "spanID", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "parentSpanID", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "operationName", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "serviceName", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "kind", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "statusCode", "type": "number", "typeInfo": { "frame": "int64" } }, { "name": "statusMessage", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "instrumentationLibraryName", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "instrumentationLibraryVersion", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "traceState", "type": "string", "typeInfo": { "frame": "string" } }, { "name": "serviceTags", "type": "other", "typeInfo": { "frame": "json.RawMessage" } }, { "name": "startTime", "type": "number", "typeInfo": { "frame": "float64" } }, { "name": "duration", "type": "number", "typeInfo": { "frame": "float64" } }, { "name": "logs", "type": "other", "typeInfo": { "frame": "json.RawMessage" } }, { "name": "references", "type": "other", "typeInfo": { "frame": "json.RawMessage" } }, { "name": "tags", "type": "other", "typeInfo": { "frame": "json.RawMessage" } } ] }, "data": { "values": [ [ "2b56d15244e3f8a5c2a5b53abc456b2c", "2b56d15244e3f8a5c2a5b53abc456b2c", "2b56d15244e3f8a5c2a5b53abc456b2c", "2b56d15244e3f8a5c2a5b53abc456b2c", "2b56d15244e3f8a5c2a5b53abc456b2c", "2b56d15244e3f8a5c2a5b53abc456b2c" ], [ "b213fa13600e65cd", "0d515572cb8f7a38", "1d1d8d55a007b4b6", "f41538dcfe9a9798", "b1af8faeda40e84e", "f9fab371dd024bf6" ], [ "0000000000000000", "b213fa13600e65cd", "b213fa13600e65cd", "b213fa13600e65cd", "b213fa13600e65cd", "b213fa13600e65cd" ], [ "ABC", "EndTask", "Download", "Installation", "Configuration", "Cleanup" ], [ "ABC", "ABC", "ABC", "ABC", "ABC", "ABC" ], [ "server", "internal", "internal", "internal", "internal", "internal" ], [ 0, 0, 0, 0, 0, 0 ], [ "", "", "", "", "", "" ], [ "ABC", "ABC", "ABC", "ABC", "ABC", "ABC" ], [ "", "", "", "", "", "" ], [ "", "", "", "", "", "" ], [ [ { "value": "ABC", "key": "service.name" } ], [ { "value": "ABC", "key": "service.name" } ], [ { "value": "ABC", "key": "service.name" } ], [ { "value": "ABC", "key": "service.name" } ], [ { "value": "ABC", "key": "service.name" } ], [ { "value": "ABC", "key": "service.name" } ] ], [ 1723505579249, 1723507119401.1223, 1723505580981.8196, 1723505582074.0852, 1723506455189.1418, 1723507118068.9424 ], [ 1540856.247409, 703.912259, 1091.652089, 873113.290701, 662879.357896, 763.534413 ], [ null, null, null, null, null, null ], [ null, null, null, null, null, null ], [ [ { "value": "uslz1123", "key": "node" }, { "value": "CZZ12_103", "key": "version" }, { "value": "238781037", "key": "tJId" }, { "value": "192.168.10.124", "key": "nodeIp" }, { "value": "SUCCESS", "key": "executionResult" }, { "value": "30567164", "key": "tAId" }, { "value": "ABC-1.213", "key": "version" }, { "value": "LIX3143", "key": "nodeType" }, { "value": "undefined", "key": "env" }, { "value": "u-worker2", "key": "hostalias" }, { "value": "unknown", "key": "datacenter" }, { "value": "undefined", "key": "service" } ], [ { "value": "undefined", "key": "env" }, { "value": "u-worker2", "key": "hostalias" }, { "value": "unknown", "key": "datacenter" }, { "value": "undefined", "key": "service" } ], [ { "value": "uslz1123", "key": "node" }, { "value": "CZZ12_103", "key": "version" }, { "value": "238781037", "key": "tJId" }, { "value": "192.168.10.124", "key": "nodeIp" }, { "value": "30567164", "key": "tAId" }, { "value": "LIX3143", "key": "nodeType" }, { "value": "undefined", "key": "env" }, { "value": "u-worker2", "key": "hostalias" }, { "value": "unknown", "key": "datacenter" }, { "value": "undefined", "key": "service" } ], [ { "value": "uslz1123", "key": "node" }, { "value": "CZZ12_103", "key": "version" }, { "value": "238781037", "key": "tJId" }, { "value": "192.168.10.124", "key": "nodeIp" }, { "value": "30567164", "key": "tAId" }, { "value": "LIX3143", "key": "nodeType" }, { "value": "undefined", "key": "env" }, { "value": "u-worker2", "key": "hostalias" }, { "value": "unknown", "key": "datacenter" }, { "value": "undefined", "key": "service" } ], [ { "value": "uslz1123", "key": "node" }, { "value": "CZZ12_103", "key": "version" }, { "value": "238781037", "key": "tJId" }, { "value": "192.168.10.124", "key": "nodeIp" }, { "value": "30567164", "key": "tAId" }, { "value": "LIX3143", "key": "nodeType" }, { "value": "undefined", "key": "env" }, { "value": "u-worker2", "key": "hostalias" }, { "value": "unknown", "key": "datacenter" }, { "value": "undefined", "key": "service" } ], [ { "value": "uslz1123", "key": "node" }, { "value": "CZZ12_103", "key": "version" }, { "value": "238781037", "key": "tJId" }, { "value": "192.168.10.124", "key": "nodeIp" }, { "value": "30567164", "key": "tAId" }, { "value": "LIX3143", "key": "nodeType" }, { "value": "undefined", "key": "env" }, { "value": "u-worker2", "key": "hostalias" }, { "value": "unknown", "key": "datacenter" }, { "value": "undefined", "key": "service" } ] ] ] } } ] } } }

you can that there are 6 span when search by traceID and only 5 spans with search with tempoQL

joe-elliott commented 2 months ago

Sorry, but this json is not particularly helpful. Can you share a screenshot like the one I did above? It will be a good starting point for me to understand what is happening.

wolf666666 commented 2 months ago

Sorry, but this json is not particularly helpful. Can you share a screenshot like the one I did above? It will be a good starting point for me to understand what is happening.

I'd like to, but the company's computer is disabled to share screenshot. To let you see the json result more clearly, I upload the json with files:

search.json

findByTraceID.json

the search.json file has 5 spans: "ABC", "EndTask", "Installation", "Configuration", "Cleanup" the findByTraceID.json has 6 spans: "ABC", "EndTask", "Download", "Installation", "Configuration", "Cleanup"

joe-elliott commented 2 months ago

Can you run the cli query blocks command and share the results?

This will dump the trace exactly as it is in the blocks and may provide a clue as to why your search is returning fewer spans. The log lines in the query frontend that are recorded when the query is made would also be helpful. They will show me exactly what query is executed.

github-actions[bot] commented 3 days ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.