thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.07k stars 2.09k forks source link

Query Frontend: Incorrect cache keys for dynamic split interval #7831

Open lachruzam opened 1 week ago

lachruzam commented 1 week ago

Thanos, Prometheus and Golang version used: quay.io/thanos/thanos:v0.36.1

What happened: The interval used to generate cache keys differs from the interval used to split the original query.

for example, with the settings:

--query-range.min-split-interval=2h
--query-range.max-split-interval=96h
--query-range.horizontal-shards=12

a query with an 8h range and a 5m step is split using a 40m interval, while cache keys are generated using a 2h interval.

What you expected to happen: Cache keys should be generated using the same interval as used for query splitting.

How to reproduce it (as minimally and precisely as possible): Run query-frontend using settings listed above.

Full logs to relevant components:

Anything else we need to know:

dosubot[bot] commented 1 week ago

Hey @lachruzam! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

The issue you're experiencing seems to stem from a mismatch between the intervals used for query splitting and cache key generation. In Thanos, the query splitting mechanism and cache key generation are designed to work with specified intervals, but they might not always align perfectly due to configuration settings.

The query splitting mechanism uses the SplitByIntervalMiddleware function, which splits requests based on the specified min-split-interval and max-split-interval settings. These settings determine how the query is divided into smaller sub-queries [1].

On the other hand, cache key generation is handled by the thanosCacheKeyGenerator, which uses a split interval function to determine cache keys. The interval for cache key generation is influenced by the interval function, which is a queryrange.IntervalFn [2].

To ensure consistency between the intervals used for query splitting and cache key generation, you might need to review and adjust the configuration settings. Specifically, ensure that the intervals specified for query splitting (min-split-interval and max-split-interval) align with the expectations for cache key generation. If the intervals are not aligning as expected, it could be due to how these settings are interpreted or applied in the code.

Unfortunately, the exact logic for how these intervals are determined when both min-split-interval and max-split-interval are specified isn't detailed in the available documentation, so further investigation into the code or configuration might be necessary to resolve this discrepancy.

To continue talking to Dosu, mention @dosu.

lachruzam commented 1 week ago

PR: https://github.com/thanos-io/thanos/pull/7832