Open KhafRuslan opened 1 week ago
@KhafRuslan thanks for the report. Do other queries succeed or is the issue related to parametric queries? Does the same issue happen for any timerange size, or specifically for large ones?
thanks for the report. Do other queries succeed or is the issue related to parametric queries? Does the same issue happen for any timerange size, or specifically for large ones?
This is independent of the time range. The problem is only with trace data type. Other queries with other data types work
@KhafRuslan I'm not sure if querying without any tags is supported. Please try specifying the service name or span name or any other tag.
I've also tried specifying a tag it doesn't fix the problem But after a while the search inside qryn view worked, but not in grafana Data source added correctly I'm also confused by the way the data is displayed within clickhouse. I added a screenshot to the first post. The first three tables contain data in a strange encoding, is this how it should be ?
@KhafRuslan which version of Grafana are you using? We're solid with 10.x (stable) while 11.x is considered still experimental.
@KhafRuslan the tag request from grafana that you have mentioned in the previous response. Can you show the logs of the qryn instance?
I'm using version 10
The qryn logs :
{"level":50,"time":1720189867156,"pid":18,"hostname":"8ab58f9931b3","name":"qryn","msg":"rawRequest error: WITH index_search AS (select `trace_id` as `trace_id`,span_id as `span_id`,any(duration) as `duration`,any(timestamp_ns) as `timestamp_ns` from `qryn`.`tempo_traces_attrs_gin` as `traces_idx` where (((`date` >= '2024-07-05') and (`date` <= '2024-07-05') and (`traces_idx`.`timestamp_ns` >= 1720168266000000000) and (`traces_idx`.`timestamp_ns` < 1720189866000000000))) and (((key = 'service.name') and (val = 'serviceName')) or ((key = 'service.name') and (val = 'serviceName'))) group by `trace_id`,`span_id` having (bitAnd(groupBitOr(bitShiftLeft(toUInt64((key = 'service.name') and (val = 'serviceName')), 0)+bitShiftLeft(toUInt64((key = 'service.name') and (val = 'serviceName')), 1)) as bsCond, 1) != 0) order by `timestamp_ns` desc), index_grouped AS (select `trace_id` as `trace_id`,groupArray(100)(span_id) as `span_id` from index_search group by `trace_id` order by max(index_search.timestamp_ns) desc limit 20), trace_ids AS (select `trace_id` from index_grouped), trace_span_ids AS (select `trace_id`,`span_id` from index_grouped array join `span_id` ) select lower(hex(traces.trace_id)) as `trace_id`,arrayMap(x -> lower(hex(x)), groupArrayIf(traces.span_id, (traces.trace_id, traces.span_id) IN trace_span_ids)) as `span_id`,groupArrayIf(traces.duration_ns, (traces.trace_id, traces.span_id) IN trace_span_ids) as `duration`,groupArrayIf(traces.timestamp_ns, (traces.trace_id, traces.span_id) IN trace_span_ids) as `timestamp_ns`,min(traces.timestamp_ns) as `start_time_unix_nano`,toFloat64(max(traces.timestamp_ns + traces.duration_ns) - min(traces.timestamp_ns)) / 1000000 as `duration_ms`,argMin(traces.name, traces.timestamp_ns) as `root_service_name`,groupArrayIf(base64Encode(traces.payload), (traces.trace_id, traces.span_id) IN trace_span_ids) as `payload`,groupArrayIf(traces.payload_type, (traces.trace_id, traces.span_id) IN trace_span_ids) as `payload_type` from `qryn`.`tempo_traces_dist` as `traces` where ((traces.trace_id in (trace_ids))) group by `traces`.`trace_id` order by `start_time_unix_nano` desc limit 20 FORMAT JSON"}
{"level":50,"time":1720189867156,"pid":18,"hostname":"8ab58f9931b3","name":"qryn","msg":"Code: 8. DB::Exception: Cannot find column `groupArrayIf(__table26.span_id, in(tuple(__table26.trace_id, __table26.span_id), __set_4824305632648500369_4093749546618302003))` in source stream, there are only columns: [__table26.trace_id, groupArrayIf(__table26.span_id, in(tuple(__table26.trace_id, __table26.span_id), __set_17215724413377854956_89883972281876497)), groupArrayIf(__table26.duration_ns, in(tuple(__table26.trace_id, __table26.span_id), __set_17215724413377854956_89883972281876497)), groupArrayIf(__table26.timestamp_ns, in(tuple(__table26.trace_id, __table26.span_id), __set_17215724413377854956_89883972281876497)), min(__table26.timestamp_ns), max(plus(__table26.timestamp_ns, __table26.duration_ns)), argMin(__table26.name, __table26.timestamp_ns), groupArrayIf(base64Encode(__table26.payload), in(tuple(__table26.trace_id, __table26.span_id), __set_17215724413377854956_89883972281876497)), groupArrayIf(__table26.payload_type, in(tuple(__table26.trace_id, __table26.span_id), __set_17215724413377854956_89883972281876497))]. (THERE_IS_NO_COLUMN) (version 24.3.2.23 (official build))\n"}
{"level":50,"time":1720189867156,"pid":18,"hostname":"8ab58f9931b3","name":"qryn","reqId":"req-a5ff","err":"Error: Request failed with status code 400\nResponse: [400] undefined\nAxiosError: Request failed with status code 400\n at settle (/app/node_modules/axios/dist/node/axios.cjs:1966:12)\n at IncomingMessage.handleStreamEnd (/app/node_modules/axios/dist/node/axios.cjs:3065:11)\n at IncomingMessage.emit (node:events:531:35)\n at endReadableNT (node:internal/streams/readable:1696:12)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)\n at Axios.request (/app/node_modules/axios/dist/node/axios.cjs:3876:41)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async rawRequest (/app/lib/db/clickhouse.js:1386:12)\n at async processSmallResult (/app/traceql/index.js:124:20)\n at async search (/app/traceql/index.js:41:11)\n at async searchV2 (/app/lib/handlers/tempo_search.js:74:20)\n at async Object.handler (/app/lib/handlers/tempo_search.js:25:12)","msg":"Request failed with status code 400"}
{"level":50,"time":1720189867803,"pid":18,"hostname":"8ab58f9931b3","name":"qryn","msg":"rawRequest error: WITH index_search AS (select `trace_id` as `trace_id`,span_id as `span_id`,any(duration) as `duration`,any(timestamp_ns) as `timestamp_ns` from `qryn`.`tempo_traces_attrs_gin` as `traces_idx` where (((`date` >= '2024-07-05') and (`date` <= '2024-07-05') and (`traces_idx`.`timestamp_ns` >= 1720168267000000000) and (`traces_idx`.`timestamp_ns` < 1720189867000000000))) and (((key = 'service.name') and (val = 'serviceName')) or ((key = 'service.name') and (val = 'serviceName'))) group by `trace_id`,`span_id` having (bitAnd(groupBitOr(bitShiftLeft(toUInt64((key = 'service.name') and (val = 'serviceName')), 0)+bitShiftLeft(toUInt64((key = 'service.name') and (val = 'serviceName')), 1)) as bsCond, 1) != 0) order by `timestamp_ns` desc), index_grouped AS (select `trace_id` as `trace_id`,groupArray(100)(span_id) as `span_id` from index_search group by `trace_id` order by max(index_search.timestamp_ns) desc limit 20), trace_ids AS (select `trace_id` from index_grouped), trace_span_ids AS (select `trace_id`,`span_id` from index_grouped array join `span_id` ) select lower(hex(traces.trace_id)) as `trace_id`,arrayMap(x -> lower(hex(x)), groupArrayIf(traces.span_id, (traces.trace_id, traces.span_id) IN trace_span_ids)) as `span_id`,groupArrayIf(traces.duration_ns, (traces.trace_id, traces.span_id) IN trace_span_ids) as `duration`,groupArrayIf(traces.timestamp_ns, (traces.trace_id, traces.span_id) IN trace_span_ids) as `timestamp_ns`,min(traces.timestamp_ns) as `start_time_unix_nano`,toFloat64(max(traces.timestamp_ns + traces.duration_ns) - min(traces.timestamp_ns)) / 1000000 as `duration_ms`,argMin(traces.name, traces.timestamp_ns) as `root_service_name`,groupArrayIf(base64Encode(traces.payload), (traces.trace_id, traces.span_id) IN trace_span_ids) as `payload`,groupArrayIf(traces.payload_type, (traces.trace_id, traces.span_id) IN trace_span_ids) as `payload_type` from `qryn`.`tempo_traces_dist` as `traces` where ((traces.trace_id in (trace_ids))) group by `traces`.`trace_id` order by `start_time_unix_nano` desc limit 20 FORMAT JSON"}
{"level":50,"time":1720189867803,"pid":18,"hostname":"8ab58f9931b3","name":"qryn","msg":"Code: 8. DB::Exception: Cannot find column `groupArrayIf(__table26.span_id, in(tuple(__table26.trace_id, __table26.span_id), __set_12986401009218267427_7863460299579552761))` in source stream, there are only columns: [__table26.trace_id, groupArrayIf(__table26.span_id, in(tuple(__table26.trace_id, __table26.span_id), __set_1020460723764415165_9711705387046830144)), groupArrayIf(__table26.duration_ns, in(tuple(__table26.trace_id, __table26.span_id), __set_1020460723764415165_9711705387046830144)), groupArrayIf(__table26.timestamp_ns, in(tuple(__table26.trace_id, __table26.span_id), __set_1020460723764415165_9711705387046830144)), min(__table26.timestamp_ns), max(plus(__table26.timestamp_ns, __table26.duration_ns)), argMin(__table26.name, __table26.timestamp_ns), groupArrayIf(base64Encode(__table26.payload), in(tuple(__table26.trace_id, __table26.span_id), __set_1020460723764415165_9711705387046830144)), groupArrayIf(__table26.payload_type, in(tuple(__table26.trace_id, __table26.span_id), __set_1020460723764415165_9711705387046830144))]. (THERE_IS_NO_COLUMN) (version 24.3.2.23 (official build))\n"}
{"level":50,"time":1720189867803,"pid":18,"hostname":"8ab58f9931b3","name":"qryn","reqId":"req-a5g5","err":"Error: Request failed with status code 400\nResponse: [400] undefined\nAxiosError: Request failed with status code 400\n at settle (/app/node_modules/axios/dist/node/axios.cjs:1966:12)\n at IncomingMessage.handleStreamEnd (/app/node_modules/axios/dist/node/axios.cjs:3065:11)\n at IncomingMessage.emit (node:events:531:35)\n at endReadableNT (node:internal/streams/readable:1696:12)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)\n at Axios.request (/app/node_modules/axios/dist/node/axios.cjs:3876:41)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async rawRequest (/app/lib/db/clickhouse.js:1386:12)\n at async processSmallResult (/app/traceql/index.js:124:20)\n at async search (/app/traceql/index.js:41:11)\n at async searchV2 (/app/lib/handlers/tempo_search.js:74:20)\n at async Object.handler (/app/lib/handlers/tempo_search.js:25:12)","msg":"Request failed with status code 400"}
@KhafRuslan and the version of your clickhouse?
ClickHouse server version 24.3.2.23 (official build).
@KhafRuslan Apparently v24.3.2.23 with cluster is not supported. And the compatibility test were not run for a while.
I'll try installing a different version and I'll report back. Can you tell me what the coding of the first columns is related to, is it normal ?
Rolled back to an older version (23.12.6). Errors now occur at any request (both logos and traces), sometimes it returns data, sometimes not. Example of error:
{
"level": 50,
"time": 1720426091025,
"pid": 19,
"hostname": "bb2a43ba4fe0",
"name": "qryn",
"reqId": "req-14wa",
"err": "Error: Request failed with status code 500\nResponse: [500] {\"exception\": \"Code: 10. DB::Exception: Not found column intDiv(__table1.timestamp_ns, 1000000_UInt32) in block. There are only columns: intDiv(timestamp_ns, 1000000), arrayFilter(lambda(tuple(x), notIn(tupleElement(x, 1), '__error__')), JSONExtractKeysAndValues(labels, 'String')): While executing Remote. (NOT_FOUND_COLUMN_IN_BLOCK) (version 24.3.2.23 (official build))\"}\n{\"exception\": \"Code: 10. DB::Exception: Not found column intDiv(__table1.timestamp_ns, 1000000_UInt32) in block. There are only columns: intDiv(timestamp_ns, 1000000), arrayFilter(lambda(tuple(x), notIn(tupleElement(x, 1), '__error__')), JSONExtractKeysAndValues(labels, 'String')): While executing Remote. (NOT_FOUND_COLUMN_IN_BLOCK) (version 24.3.2.23 (official build))\"}\n\nAxiosError: Request failed with status code 500\n at settle (/app/node_modules/axios/dist/node/axios.cjs:1966:12)\n at RedirectableRequest.handleResponse (/app/node_modules/axios/dist/node/axios.cjs:3013:9)\n at RedirectableRequest.emit (node:events:519:28)\n at RedirectableRequest._processResponse (/app/node_modules/follow-redirects/index.js:398:10)\n at RedirectableRequest._onNativeResponse (/app/node_modules/follow-redirects/index.js:91:12)\n at Object.onceWrapper (node:events:634:26)\n at ClientRequest.emit (node:events:519:28)\n at HTTPParser.parserOnIncomingClient (node:_http_client:698:27)\n at HTTPParser.parserOnHeadersComplete (node:_http_common:119:17)\n at Socket.socketOnData (node:_http_client:540:22)\n at Axios.request (/app/node_modules/axios/dist/node/axios.cjs:3876:41)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async queryFingerprintsScan (/app/lib/db/clickhouse.js:385:19)\n at async Object.handler (/app/lib/handlers/query_range.js:41:22)",
"msg": "Request failed with status code 500"
}
A query that is displayed in the database logs:
WITH sel_a AS (select samples.string as string,samples.fingerprint as fingerprint,samples.timestamp_ns as timestamp_ns from qryn.samples_v3_dist as samples where ((samples.timestamp_ns between 1720339669452000000 and 1720426069452000000) and (samples.type in (0,0))) and (samples.fingerprint in (select fingerprint from (select DISTINCT fingerprint from qryn.time_series where ((JSONHas(labels, 'context') = 1) and (JSONExtractString(labels, 'context') = '[3ba2a383-494d-3b89-b015-f3a8bb9a25df]'))))) order by timestamp_ns desc limit 10) select JSONExtractKeysAndValues(time_series.labels, 'String') as labels,sel_a.* from sel_a left any join qryn.time_series_dist AS time_series on sel_a.fingerprint = time_series.fingerprint order by labels desc,timestamp_ns desc FORMAT JSONEachRow
After deleting the database, I updated two components, metrico-otel-collector and qryn. To versions 0.0.3 and 3.2.25 respectively. Qryn created the databases from scratch (it has full access to create databases). it created the following tables :
I generated the traces with the following command: /root/go/bin/telemetrygen traces --otlp-endpoint qryn-host:4318 --otlp-http --traces 1 --otlp-insecure true --size 1 --workers 3
Part of the code is related to trace reception:
Next, I queried the generated traces via datasorce grafana tempo and also checked qryn view and got the following results:
I see the following error in the qryn logs:
And it's usually in the database logs:
the data in the database will be stored in the following form:
Can you tell me what the error is related to?