Open jkroepke opened 2 weeks ago
The issue here was that the querier is not configured to point at query APIs but at store APIs ( we should guard against that better probably ); this leads to the promql-engine distributing to 0 other engines, which exposes a bug where we dont guard against that!
Edit: see https://cloud-native.slack.com/archives/CK5RSSC10/p1714654267669009
Thanos, Prometheus and Golang version used: 0.35.0
Object Storage Provider: Azure Storage Account
What happened:
After enable
query.mode=distributed
my querier get a lot of panics. Removing--query.mode=distributed
stops all pancisWhat you expected to happen:
No panics
How to reproduce it (as minimally and precisely as possible):
At the moment, I'm unable to provide a minimal reproducible environment. However, according ruler logs, all queries like
absent(up{job=\"kube-proxy\"} == 1)
(job can have any label) seems affected.We are using stateless rulers and thanos receive, no sidecars.
Thanos querier arguments:
Full logs to relevant components:
Uncomment if you would like to post collapsible logs:
Ruler Logs
Just a few, they are repeating ``` {"caller":"rule.go:968","component":"rules","err":"rpc error: code = Internal desc = runtime error: index out of range [0] with length 0","level":"error","query":"absent(up{job=\"kube-proxy\"} == 1)","ts":"2024-05-02T13:03:09.954559928Z"} {"caller":"rule.go:938","component":"rules","err":"read query instant response: perform POST request against http://opsstack-thanos-query.opsstack.svc.cluster.local:10902/api/v1/query: Post \"http://opsstack-thanos-query.opsstack.svc.cluster.local:10902/api/v1/query\": EOF","level":"error","query":"absent(up{job=\"apiserver\"} == 1)","ts":"2024-05-02T12:59:26.372712478Z"} ```
Querier Logs
https://gist.github.com/jkroepke/9fc58319bf819866138a8dae4f1c8d92
Anything else we need to know:
Environment: