thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.16k stars 2.1k forks source link

Query distributed mode does not fetch from storage gateways #7757

Open hanem100k opened 2 months ago

hanem100k commented 2 months ago

Query, store gateway

Thanos, Prometheus and Golang version used: Thanos version 0.26.1

Object Storage Provider: Azure blob store

What happened: Having a set of receivers and sharded store gateways. Whenever query distributed mode is enabled, data will only be returned from receivers but not stores.

Removing query.mode=distributed from arguments solves the issue.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible): It is reproducible 100% when you query for a time range that exceeds your receivers.

Metrics with distribution enabled:

image

Metrics without distribution enabled: image

MichaHoffmann commented 2 months ago

Note: https://github.com/thanos-io/thanos/blob/883fade9bd75fe595b6e947a33c59e27fca1abda/pkg/query/remote_engine.go#L116 likely the MinT() computation is off for usecases where we have [sidecar,receiver,..]+storagegw endpoints on same remote engine.

fpetkovski commented 2 months ago

@hanem100k can you try attaching a separate querier to store gateways?

hanem100k commented 2 months ago

I do have a Top level [QFE] -> [Q] -> Leaf [Q] -> [receivers, stores]

There is a single Query in the top layer and there are N leafs.

If I go into the leaf Querier directly, it returns ok from stores as well as receivers.

@fpetkovski

can you try attaching a separate querier to store gateways?

Do you mean that I should retry the distributed mode by having a Querier for stores only and one for receivers only?

fpetkovski commented 2 months ago

Actually we are going to fix this configuration with @MichaHoffmann soon, so let's wait for the new patch release.

yeya24 commented 3 weeks ago

@fpetkovski Is this issue already fixed?