metrico / qryn

Polyglot Observability Stack. Lightweight & Drop-in compatible with Loki, Prometheus, Tempo, Pyroscope, Opentelemetry, Datadog & more! WASM powered ⭐️ Star to Support
https://qryn.dev
GNU Affero General Public License v3.0
1.05k stars 63 forks source link

panic: vector cannot contain metrics with the same labelset #500

Closed shimaore closed 2 months ago

shimaore commented 2 months ago

This WASM error is present in (at least) 3.2.17 and up to 3.2.19 (I didn't test above 3.2.19).

32ae763b4183 WITH idx AS (select `fingerprint` from `prometheus`.`time_series_gin` as `time_series_gin` where ((((`key` = '__name__') and (`val` = 'node_filesystem_size_bytes'))) and (`date` >= toDate(fromUnixTimestamp(1714362720))) and (`date` <= toDate(fromUnixTimestamp(1714384320))) and (`type` in (0,0))) group by `fingerprint` having (groupBitOr(bitShiftLeft(((`key` = '__name__') and (`val` = 'node_filesystem_size_bytes'))::UInt64, 0)) = 1)), raw AS (select argMaxMerge(last) as `value`,`fingerprint`,intDiv(timestamp_ns, 15000000000) * 15000 as `timestamp_ms` from `metrics_15s` as `metrics_15s` where ((`fingerprint` in (idx)) and (`timestamp_ns` >= 1714362720000000000) and (`timestamp_ns` <= 1714384320000000000) and (`type` in (0,0))) group by `fingerprint`,`timestamp_ms` order by `fingerprint`,`timestamp_ms`), timeSeries AS (select `fingerprint`,arraySort(JSONExtractKeysAndValues(labels, 'String')) as `labels` from `prometheus`.`time_series` where ((`fingerprint` in (idx)) and (`type` in (0,0)))) select any(labels) as `stream`,arraySort(groupArray((raw.timestamp_ms, raw.value))) as `values` from raw as `raw` any left jointimeSeries as time_series on `time_series`.`fingerprint` = raw.fingerprint group by `raw`.`fingerprint` order by `raw`.`fingerprint`
32ae763b4183 panic: vector cannot contain metrics with the same labelset
32ae763b4183 Empty <[Object: null prototype] {}> {
32ae763b4183   end: '1714384320',
32ae763b4183   query: 'node_filesystem_free_bytes / node_filesystem_size_bytes',
32ae763b4183   start: '1714362720',
32ae763b4183   step: '20'
32ae763b4183 }
32ae763b4183 RuntimeError: unreachable
32ae763b4183     at runtime._panic (wasm://wasm/00d893c6:wasm-function[81]:0x56c7)
32ae763b4183     at (*github.com/prometheus/prometheus/promql.evaluator).error (wasm://wasm/00d893c6:wasm-function[1175]:0x10e751)
32ae763b4183     at (*github.com/prometheus/prometheus/promql.evaluator).errorf (wasm://wasm/00d893c6:wasm-function[1176]:0x10e8a1)
32ae763b4183     at (*github.com/prometheus/prometheus/promql.evaluator).rangeEval (wasm://wasm/00d893c6:wasm-function[1155]:0x107028)
32ae763b4183     at (*github.com/prometheus/prometheus/promql.evaluator).eval (wasm://wasm/00d893c6:wasm-function[1149]:0x1004e7)
32ae763b4183     at (*github.com/prometheus/prometheus/promql.evaluator).Eval (wasm://wasm/00d893c6:wasm-function[1148]:0xfe603)
32ae763b4183     at main.pql$2 (wasm://wasm/00d893c6:wasm-function[1382]:0x158544)
32ae763b4183     at onDataLoad (wasm://wasm/00d893c6:wasm-function[1375]:0x154fb4)
32ae763b4183     at onDataLoad.command_export (wasm://wasm/00d893c6:wasm-function[1403]:0x15ac5a)
32ae763b4183     at Object.onDataLoad (/app/wasm_parts/main.js:28:49)
32ae763b4183 leveldebugmsgLookback delta is zero, setting to default valuevalue5m0s
32ae763b4183 leveldebugmsgLookback delta is zero, setting to default valuevalue5m0s

Happy to provide any additional data.

Dletta commented 2 months ago

@shimaore

I see the query contains 'node_filesystem_free_bytes / node_filesystem_size_bytes' , this is quite a common prometheus gotcha, see an example here: https://github.com/prometheus/prometheus/issues/11397

What this means is that multiple instances of the node_filesystem_free_bytes metric exist, but they have different values based on the same set of labels. This is a prometheus issue, not a qryn issue, as this is indeed the normal behavior you should see when using a native prometheus with the same dataset.

The issue referenced above mentions a possible workaround as well it shows that prometheus itself has not finished their proposed fix for this as of 2 weeks ago.

I will close this issue for now