Closed BertHartm closed 9 months ago
I tested v0.0.84, v0.0.70, and v0.0.60 as well, and this isn't new, so I'm not sure if it's somehow a misconfiguration / misunderstanding on my end, or a bug that's been around a while.
This is a bit of a complicated topic -- which I attempted to cover in the comments for the option https://github.com/jacksontj/promxy/blob/master/pkg/servergroup/config.go#L39-L56
Let me attempt to explain here in a bit more detail -- hopefully we can clarify the confusion here and get the docs updated.
The main feature within promxy that makes it performant is this concept of a NodeReplacer
which replaces complex queries with smaller queries we can farm out to downstream prom API endpoints and re-aggregate. A simple example of this is sum(foo)
-- instead of querying all data for foo
then doing the sum in promxy we can simply ask each downstream sum(foo)
then re-sum that in promxy.
This may seem like an aside; but this is really the key function here. So in addition to making this performant this also means that the majority of queries are not raw data fetches -- instead they are smaller promql queries. So, promxy will actually only send a raw data fetch downstream when it is absolutely necessary -- such as a bare matrix selector (e.g. foo[1h]
). In the case of a raw data fetch we can chose to use remote_read; but in all other cases (where we send a promql query) we cannot.
So now you may ask "if its so niche, why does this feature even exist?" -- to which I say what a great question! There are 2 reasons for this the initial PoC implementation of promxy did all queries using raw fetches (node replacer was added soon after). During that initial PoC though I was reminded that prometheus treats NaNs
a bit ... weird. Specifically it has a concept of a StaleNan
which indicates that a series was not scrape-able anymore -- but unfortunately the promql interface consumes those StaleNans
so using the regular query endpoint we can't determine if the series ended or if there is a gap. RemoteRead does provide the values for StaleNan
instead of consuming them -- so generally speaking its a more faithful representation of the correct values -- but this edge case is very uncommon (you have to have a query that can't be nodereplaced and has some interactions between Lookback delta due to missing series). I did actually consider requiring RemoteRead (for simplicity) but some implementations don't support RemoteRead (i.e. VictoriaMetrics) and at that point the code/tests already existed -- so I have left it in.
So with that (probably way too much) context hopefully you can understand the feature a bit more and the complexity that comes with it. If you have any suggestions on how to better communicate this complexity in the comments/docs I am definitely more than happy to talk through those :)
Thanks, that does explain more, and I had been trying to work around the node replacer in this test. I think my largest mis-understanding was around when remote_read would take effect. My assumption was that it would happen in most (all) cases, but it seems a much more limited subset than I had expected, though now that I re-read the config I do see that it's explicitly stated, so I'm not sure how I'd go about wording it better.
Fair enough -- it is a somewhat nuanced thing. Well, if nothing else if someone is confused they will hopefully find this issue/question and get the answer from there!
Sounds like we're all set with this so I'll go ahead and mark this closed; if you have anything else (or have more questions on this) feel free to re-open or create a new one!
I've been trying to test out the remote_read functionality, but I'm seeing that promxy is not actually making the remote read calls, and is instead using the query / query_range calls when I run queries via the promxy UI.
I have the following promxy config:
most of that is default, but
remote_read: true
was explicitly set in my config file.I'm running wireshark to see the actual request sent to prometheus from promxy, and it's a
POST /api/v1/query_range
instead of the expectedPOST /api/v1/read
.Additionally, the promxy metrics seem to support this with:
...
showing that it created the remote connection, but never called it and called query_range instead.
I'm building off master: