Open beorn7 opened 4 months ago
We have discussed this at the dev-summit. The consensus reached there is the following:
Starting with Prometheus 3.0, any querier implementation MUST return something that matches the selectors sent in the request, including remote-read. We are open to adding a flag to optionally reinstate the old expectations should this be required by real-world use cases.
Note that a dev-summit consensus is not a formal vote and still subject to lazy consensus among the maintainers. If anyone has objections, please add them here for discussion.
In more detail: The idea is that we can do optimizations relying on selectors returning matching results starting from v3.0.0. Should relevant use cases show up that are now broken by this expectation, we can introduce some way for storage implementations (presumably that's affecting mostly remote-read) to communicate that they are not behaving in that way and that their results cannot be subject of these optimizations.
@beorn7 I was looking back at #8053 and thinking about implementing it. Is it unnecessary now that 3.0 is coming or is it still worth adding for the time being?
@LeviHarrison : My understanding is that the mandate we have agreed upon now unlocks what has been proposed in #8053. So v3.0 makes it possible to work on it, while it was impossible previously. However, @bboreham had some plans for farther reaching optimizations. Maybe chat with him first.
To clarify: The optimization described in #8053 hasn't been implemented at all. In the discussion, the concern was brought up that this optimization, should we implement it, would break with remote-read backends that return series that do not match the selectors in the request. So it was concluded that we needed to not apply the optimization for those backends.
This issue here intends to make it part of the contract that any storage implementation MUST return series that match the selectors in the request. With that, we can just implement the optimization without worrying about any condition where we cannot apply it. It's first a matter of simplification and second guarantees that we don't under-use the optimization (e.g. by never using it as soon as remote-read is part of the game).
Further, this issue also intends to acknowledge that we can still add an optional deactivation of the optimization, should it actually be needed, without creating a breaking change. (But the assumption is that it won't really be needed, so we go down the simple path.)
Proposal
Probably very surprising for most, there is no requirement that the storage returns matching time series for a given selector. This apparently unlocks certain more or less valid use cases, mostly around remote-read. There is a lengthy discussion in #8053. However, this behavior also prevents a lot of optimizations, and there are implications for the planned migration to UTF-8 capable names. (See the relevant design doc for starters, but there are implications about rewriting the results that aren't yet reflected in the design doc.)
My proposal here is to revisit the assumption that there are valid and useful use cases for selectors returning a result that doesn't match the selector. Almost four years later, we should have a pretty good picture if that is still and actually the case. The Prometheus 3 release is a good time to act on the result:
I will also submit this to the dev summit because I think it warrants a broader discussion.
Paging @ywwg @GiedriusS @fpetkovski @charleskorn @bwplotka as this might be relevant for you.