thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.07k stars 2.09k forks source link

Query Push down causes error when using min, max_over_time functions #6970

Open namm2 opened 10 months ago

namm2 commented 10 months ago

Thanos, Prometheus and Golang version used: Thanos v0.32.0 Prometheus v2.47.1

Object Storage Provider: GCS

What happened: We enabled query-pushdown feature on Thanos Query

We found out the max_over_time function doesn't work when Thanos Query pulls data from both Thanos sidecar and store, with error message: execution: vector cannot contain metrics with the same labelset

The max_over_time function works fine when either: a) select time range that fits Thanos Sidecar (Prometheus's) storage retention, e.g. last 3h. or b) select time range that fits Thanos Store max-time, eg. now-3h

What you expected to happen: All *_over_time functions should work normally

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

Anything else we need to know:

GiedriusS commented 10 months ago

Please update to the latest version and try again. Is it reproducible?

MichaHoffmann commented 10 months ago

It feels like a dedup issue from old version

svenmueller commented 10 months ago

Addition to Nam's issue description: The used Thanos version is 0.32.4

MichaHoffmann commented 10 months ago

That should in theory work; can you share config and maybe blocks to try and reproduce?

namm2 commented 10 months ago

@MichaHoffmann, can you tell how can I export blocks from Prometheus & Thanos Store please?

Thanos Query args:

      --log.format=json
      --http-address=0.0.0.0:19192
      --query.auto-downsampling
      --query.partial-response
      --query.max-concurrent=20
      --query.replica-label=thanos_ruler_replica
      --query.timeout=180s
      --query.replica-label=replica
      --endpoint=dnssrv+_grpc._tcp.thanos-store-api.prometheus.svc
      --alert.query-url=https://thanos-query.domain.com
      --enable-feature=query-pushdown
      --store.response-timeout=175s

The simplified stack diagram looks like: Monitoring

GiedriusS commented 10 months ago

I think we would need the query + labelsets returned by each vector selector to be able to debug this.

MichaHoffmann commented 10 months ago

So max_over_time drops the name label, right? I wonder if its a case of deduplication tripping on one series with name label ( from storage gw ) and one without ( from pushed down sidecar ) and then here querier removes the name label from the storage gw series too and we have 2 series with same labels get sent to the promql engine. Might this be whats happening?

MichaHoffmann commented 10 months ago

Ill write a quick e2e test and try to reproduce to see if my theory might hold. Can you disable query pushdown in the meantime to remedy the situation for now?

ahurtaud commented 5 months ago

Hello, I am facing this error with the latest version of thanos v0.35.0 (in distributed or local query mode). I dont think there is querypushdown flag anymore? Anyway we are not using it.

I can confirm the having issue query dont have the name selector and is using max_over_time:

group by (subscription) (max_over_time({prometheus="shared", tenant_id="azure-metrics-otel"}[10800s]))

Fallback to v0.34.1 make the query working.

MichaHoffmann commented 5 months ago

Hello, I am facing this error with the latest version of thanos v0.35.0 (in distributed or local query mode). I dont think there is querypushdown flag anymore? Anyway we are not using it.

I can confirm the having issue query dont have the name selector and is using max_over_time:

group by (subscription) (max_over_time({prometheus="shared", tenant_id="azure-metrics-otel"}[10800s]))

Fallback to v0.34.1 make the query working.

are you using prometheus or thanos engine?

ahurtaud commented 5 months ago

Hello, I am facing this error with the latest version of thanos v0.35.0 (in distributed or local query mode). I dont think there is querypushdown flag anymore? Anyway we are not using it. I can confirm the having issue query dont have the name selector and is using max_over_time:

group by (subscription) (max_over_time({prometheus="shared", tenant_id="azure-metrics-otel"}[10800s]))

Fallback to v0.34.1 make the query working.

are you using prometheus or thanos engine?

Thanos Engine, However I tried the same query against prometheus engine (switching using thanos UI dropdown) and it gives same error.

fpetkovski commented 5 months ago

Does the issue happen only when metric name is not specified?

MichaHoffmann commented 5 months ago

@ahurtaud Can you maybe help us understand the issue better by providing a view of your cluster deployment and maybe even the blocks (raw data) of the involved components?

ahurtaud commented 5 months ago

Does the issue happen only when metric name is not specified?

Actually yes, Trying to give easier-to-reproduce queries: This one is broken: group by (namespace) (max_over_time({prometheus="platform", stack="ccp-ne-ogob01a", namespace=~".+"}[5m]))

While this one is working: group by (namespace) (max_over_time(kube_pod_info{prometheus="platform", stack="ccp-ne-ogob01a", namespace=~".+"}[5m]))

Nice notice, this one is working: group by (namespace) (max_over_time({prometheus="platform", stack="ccp-ne-ogob01a", namespace=~".+",__name__="kube_pod_info"}[5m]))

but that one is failing: group by (namespace) (max_over_time({prometheus="platform", stack="ccp-ne-ogob01a", namespace=~".+",__name__=~"kube_pod_.*"}[5m]))

So I think the issue is when the query is matching multiple metric names. (we have kube_pod_info, kube_pod_annotations etc etc.

@ahurtaud Can you maybe help us understand the issue better by providing a view of your cluster deployment and maybe even the blocks (raw data) of the involved components?

The setup is quite generic I think, maybe you can try to reproduce. Roughly we have observer / observee clusters pattern. 1 Upper-layer query (the one we execute queries on) -> 2 Stores ( == 2 queriers on the observee clusters) -> each having prometheus+sidecar monitoring kubernetes,

something like

thanos query  ----- thanos-query --- sidecar
              |
              ---- thanos query --- sidecar
MichaHoffmann commented 5 months ago

Can you share last blocks from the prometheus instances and querier configurations, would that be possible?

ahurtaud commented 5 months ago

that will be difficult for me before next week unfortunately. Also I think it will be easier to try to reproduce with some generic stacks (kube-prometheus like) rather than uploading blocks :/

ahurtaud commented 5 months ago

I actually run the same query into prometheus directly (v2.51.2) and it now gives the same error: Screenshot 2024-05-13 at 16 44 09

After discussing internally with @MichaHoffmann , the change is actually more a fix to stick to the prometheus API rather than a thanos issue. I have even found the feature in proemtheus that would solve this as well for thanos I guess: https://github.com/prometheus/prometheus/issues/11397

I guess we can close this issue with the above reference.

For the one facing this issue, there is a workaround explained in the prometheus issue. I would however advise to add an exact metric name matcher to your queries when using over_time aggregations.