promxy returns empty results when using <aggregate>_over_time() function wrapping multiple queries with VictoriaMetrics

Z3po commented 6 months ago

We're using promxy to proxy our requests to either multiple prometheus instances or multiple victoriametrics instances running in multiple clusters. We are using promxy for providing short-time data and VictoriaMetrics to provide us with metrics for longer time ranges.

While this works in general and both promxy instances show identical data I recognized that it doesn't work well with using _over_time() functions wrapping multiple queries. Using the functions against simple single metric queries it does work with VictoriaMetrics as well. The whole issue is a bit fuzzy and I don't know where I can find additional information to share right now...

The following query works:

max_over_time(sum(container_memory_usage_bytes{id!="/",namespace=~"metrics",cluster=~".*",container!="",container!="POD",container=~".*",pod=~".+"}) by (container,pod,cluster)[1h:5m])

The query that doesn't work:

max_over_time(max(sum(container_memory_usage_bytes{id!="/",namespace=~"metrics",cluster=~".*",container!="",container!="POD",container=~".*",pod=~".+"}) by (container,pod,cluster) / sum(kube_pod_container_resource_limits{namespace=~"metrics",cluster=~".*",pod=~".+",container=~".*",resource="memory"}) by (container,pod,cluster)) by (container,pod,cluster)[1h:5m])

This query kind of calculates the maximum percentage of ram used vs it's configured limits.

Our promxy configuration for Prometheus looks like the following:

global:
  evaluation_interval: 5s
  external_labels:
    source: promxy
promxy:
  server_groups:
    - static_configs:
        - targets:
            - cluster-1:443
      labels:
        cluster: cluster-1
      remote_read: true
      remote_read_path: /api/v1/read
      query_params:
        nocache: 1
      scheme: https
      http_client:
        dial_timeout: 4s
        tls_config:
          insecure_skip_verify: true
          cert_file: /etc/certs/client.crt
          key_file: /etc/certs/client.key
      ignore_error: true
    - static_configs:
        - targets:
            - cluster-2:443
      labels:
        cluster: cluster-2
      remote_read: true
      remote_read_path: /api/v1/read
      query_params:
        nocache: 1
      scheme: https
      http_client:
        dial_timeout: 4s
        tls_config:
          insecure_skip_verify: true
          cert_file: /etc/certs/client.crt
          key_file: /etc/certs/client.key
      ignore_error: true
    ...

While The promxy configuration for VictoriaMetrics looks like this:

global:
  evaluation_interval: 5s
  external_labels:
    source: promxy
promxy:
  server_groups:
    - static_configs:
        - targets:
            - cluster-1:443
      labels:
        promxyServerGroup: cluster-1
      remote_read: true
      remote_read_path: /api/v1/read
      query_params:
        nocache: 1
      scheme: https
      http_client:
        dial_timeout: 4s
        tls_config:
          insecure_skip_verify: true
          cert_file: /etc/certs/client.crt
          key_file: /etc/certs/client.key
      ignore_error: true
    - static_configs:
        - targets:
            - cluster-2:443
      labels:
        promxyServerGroup: cluster-2
      remote_read: true
      remote_read_path: /api/v1/read
      query_params:
        nocache: 1
      scheme: https
      http_client:
        dial_timeout: 4s
        tls_config:
          insecure_skip_verify: true
          cert_file: /etc/certs/client.crt
          key_file: /etc/certs/client.key
      ignore_error: true
    ...

They are identical beside the fact that in promxy we add the cluster label with Prometheus while we don't need to do that using VictoriaMetrics. For this reason we use just another meta-label so we're not hit by https://github.com/jacksontj/promxy/issues/260.

Issuing the following query using the VictoriaMetrics promxy instance:

 curl -v --get --data-urlencode 'query=max_over_time(max(sum(container_memory_usage_bytes{id!="/",namespace=~"metrics",cluster=~".*",container!="",container!="POD",container=~".*",pod=~".+"}) by (container,pod,cluster) / sum(kube_pod_container_resource_limits{namespace=~"metrics",cluster=~".*",pod=~".+",container=~".*",resource="memory"}) by (container,pod,cluster)) by (container,pod,cluster)[1h:5m])' --data-urlencode "start=2023-12-13T06:39:53.152Z" --data-urlencode "end=2023-12-13T07:24:58.224Z" --data-urlencode "step=30s" 'http://localhost:8082/api/v1/query_range'
*   Trying [::1]:8082...
* Connected to localhost (::1) port 8082
> GET /api/v1/query_range?query=max_over_time%28max%28sum%28container_memory_usage_bytes%7bid%21%3d%22%2f%22%2cnamespace%3d~%22metrics%22%2ccluster%3d~%22.%2a%22%2ccontainer%21%3d%22%22%2ccontainer%21%3d%22POD%22%2ccontainer%3d~%22.%2a%22%2cpod%3d~%22.%2b%22%7d%29+by+%28container%2cpod%2ccluster%29+%2f+sum%28kube_pod_container_resource_limits%7bnamespace%3d~%22metrics%22%2ccluster%3d~%22.%2a%22%2cpod%3d~%22.%2b%22%2ccontainer%3d~%22.%2a%22%2cresource%3d%22memory%22%7d%29+by+%28container%2cpod%2ccluster%29%29+by+%28container%2cpod%2ccluster%29%5b1h%3a5m%5d%29&start=2023-12-13T06%3a39%3a53.152Z&end=2023-12-13T07%3a24%3a58.224Z&step=30s HTTP/1.1
> Host: localhost:8082
> User-Agent: curl/8.4.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 13 Dec 2023 09:37:05 GMT
< Content-Length: 63
< 
* Connection #0 to host localhost left intact
{"status":"success","data":{"resultType":"matrix","result":[]}}

just doesn't return me any data but also doesn't log very much:

 {"level":"debug","matchers":[{"Type":1,"Name":"id","Value":"/"},{"Type":2,"Name":"namespace","Value":"metrics"},{"Type":2,"Name":"cluster","Value":".*"},{"Type":1,"Name":"container","Value":""},{"Type":1,"Name":"container","Value":"POD"},{"Type":2,"Name":"container","Value":".*"},{"Type":2,"Name":"pod","Value":".+"},{"Type":0,"Name":"__name__","Value":"container_memory_usage_bytes"}],"msg":"Select","selectHints": {"Start":1702445693152,"End":1702452298224,"Step":30000,"Func":"sum","Grouping":["container","pod","cluster"],"By":true,"Range":0,"DisableTrimming":false},"time":"2023-12-13T09:37:05Z","took":70939137}
 {"level":"debug","matchers":[{"Type":2,"Name":"namespace","Value":"metrics"},{"Type":2,"Name":"cluster","Value":".*"},{"Type":2,"Name":"pod","Value":".+"},{"Type":2,"Name":"container","Value":".*"},{"Type":0,"Name":"resource","Value":"memory"},{"Type":0,"Name":"__name__","Value":"kube_pod_container_resource_limits"}],"msg":"Select","selectHints":{"Start":1702445693152,"End":1702452298224,"Step":30000,"Func":"sum","Grouping":["container","pod","cluster"],"By":true,"Range":0,"DisableTrimming":false},"time":"2023-12-13T09:37:05Z","took":73033160}
 {"remoteAddr":"127.0.0.1","time":"2023-12-13T09:37:05.912034604Z","method":"GET","path":"/api/v1/query_range","protocol":"HTTP/1.1","status":200,"responseBytes":63,"duration":0.074179855,"query":"query=max_over_time%28max%28sum%28container_memory_usage_bytes%7Bid%21%3D%22%2F%22%2Cnamespace%3D~%22metrics%22%2Ccluster%3D~%22.%2A%22%2Ccontainer%21%3D%22%22%2Ccontainer%21%3D%22POD%22%2Ccontainer%3D~%22.%2A%22%2Cpod%3D~%22.%2B%22%7D%29+by+%28container%2"}

While issuing the same versus the Prometheus promxy instance:

 curl -v --get --data-urlencode 'query=max_over_time(max(sum(container_memory_usage_bytes{id!="/",namespace=~"metrics",cluster=~".*",container!="",container!="POD",container=~".*",pod=~".+"}) by (container,pod,cluster) / sum(kube_pod_container_resource_limits{namespace=~"metrics",cluster=~".*",pod=~".+",container=~".*",resource="memory"}) by (container,pod,cluster)) by (container,pod,cluster)[1h:5m])' --data-urlencode "start=2023-12-13T06:39:53.152Z" --data-urlencode "end=2023-12-13T07:24:58.224Z" --data-urlencode "step=30s" 'http://localhost:8082/api/v1/query_range'
*   Trying [::1]:8082...
* Connected to localhost (::1) port 8082
> GET /api/v1/query_range?query=max_over_time%28max%28sum%28container_memory_usage_bytes%7bid%21%3d%22%2f%22%2cnamespace%3d~%22metrics%22%2ccluster%3d~%22.%2a%22%2ccontainer%21%3d%22%22%2ccontainer%21%3d%22POD%22%2ccontainer%3d~%22.%2a%22%2cpod%3d~%22.%2b%22%7d%29+by+%28container%2cpod%2ccluster%29+%2f+sum%28kube_pod_container_resource_limits%7bnamespace%3d~%22metrics%22%2ccluster%3d~%22.%2a%22%2cpod%3d~%22.%2b%22%2ccontainer%3d~%22.%2a%22%2cresource%3d%22memory%22%7d%29+by+%28container%2cpod%2ccluster%29%29+by+%28container%2cpod%2ccluster%29%5b1h%3a5m%5d%29&start=2023-12-13T06%3a39%3a53.152Z&end=2023-12-13T07%3a24%3a58.224Z&step=30s HTTP/1.1
> Host: localhost:8082
> User-Agent: curl/8.4.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 13 Dec 2023 09:39:37 GMT
< Transfer-Encoding: chunked
< 
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"cluster":"cluster-1","container":"config-reloader","pod":"prometheus-k8s-0"},"values":****,{"metric":{"cluster":"cluster-1","container":"config-reloader","pod":"vmagent-cluster-76f96d7567-tvc5g"},"values":....

(I've removed most of the data because IMHO they don't provide any additional insights, do they?)

(promxy also doesn't log me any insights)

{"level":"debug","matchers":[{"Type":2,"Name":"namespace","Value":"metrics"},{"Type":2,"Name":"cluster","Value":".*"},{"Type":2,"Name":"pod","Value":".+"},{"Type":2,"Name":"container","Value":".*"},{"Type":0,"Name":"resource","Value":"memory"},{"Type":0,"Name":"__name__","Value":"kube_pod_container_resource_limits"}],"msg":"Select","selectHints":{"Start":1702445693152,"End":1702452298224,"Step":30000,"Func":"sum","Grouping":["container","pod","cluster"],"By":true,"Range":0,"DisableTrimming":false},"time":"2023-12-13T09:39:37Z","took":56403085}
{"level":"debug","matchers":[{"Type":1,"Name":"id","Value":"/"},{"Type":2,"Name":"namespace","Value":"metrics"},{"Type":2,"Name":"cluster","Value":".*"},{"Type":1,"Name":"container","Value":""},{"Type":1,"Name":"container","Value":"POD"},{"Type":2,"Name":"container","Value":".*"},{"Type":2,"Name":"pod","Value":".+"},{"Type":0,"Name":"__name__","Value":"container_memory_usage_bytes"}],"msg":"Select","selectHints":{"Start":1702445693152,"End":1702452298224,"Step":30000,"Func":"sum","Grouping":["container","pod","cluster"],"By":true,"Range":0,"DisableTrimming":false},"time":"2023-12-13T09:39:37Z","took":93852363}
{"remoteAddr":"127.0.0.1","time":"2023-12-13T09:39:37.655516433Z","method":"GET","path":"/api/v1/query_range","protocol":"HTTP/1.1","status":200,"responseBytes":891578,"duration":0.137277354,"query":"step=30s\u0026query=max_over_time%28max%28sum%28container_memory_usage_bytes%7Bid%21%3D%22%2F%22%2Cnamespace%3D~%22metrics%22%2Ccluster%3D~%22.%2A%22%2Ccontainer%21%3D%22%22%2Ccontainer%21%3D%22POD%22%2Ccontainer%3D~%22.%2A%22%2Cpod%3D~%22.%2B%22%7D%29+by+%28co"}

The query does work when being issued directly to VictoriaMetrics just the aggregation layer in promxy seems not to work.

Do you need any additional information?

Thanks for taking care!!

jacksontj commented 6 months ago

First off, thanks for reporting an issue! I love seeing usage of the software and finding ways to help users get the most value out of it.

So from your issue report here it sounds like it works fine in VictoriaMetrics directly and promxy against the prometheus nodes -- but specifically not with the combination of promxy+VictoriaMetrics.

I spent a little time attempting to reproduce the issue with a local prom+promxy setup using the following query:

max_over_time(sum(node_memory_Active_bytes) by (instance)[1h:5m])
/
sum(node_memory_MemTotal_bytes) by (instance)

but was unable to reproduce (although if the issue is with VM+promxy specifically, that is to be expected).

just doesn't return me any data but also doesn't log very much:

From checking the provided logs it seems that the query sent downstream (to prom or VM) seems to match (happened ~2 minutes later -- but the same query). Unfortunately you'll need trace level logs to get the response from the downstream (which will likely shed some more light on the situation). My guess here is that ther eis some odd label interaction causing issues.

(I've removed most of the data because IMHO they don't provide any additional insights, do they?)

Generally I don't need to see the values but the queries (as to what is being requested of downstream) and the response labels (so I can see what series were responded) are useful. The actual values are useful sometimes when there is some concern about the math on the value -- but in this case it seems to be more of an issue of not getting results.

Do you need any additional information?

So the easiest would be some mechanism to reproduce this locally -- short of finding a query/setup to do that trace level logs is the next best thing (since I can see what is happening in promxy).

As a random stab (to maybe spark some additional debugging) I wonder if the query to VM is incorrect? I ask this because the query is set up as:

 curl -v --get --data-urlencode 'query=max_over_time(max(sum(container_memory_usage_bytes{id!="/",namespace=~"metrics",cluster=~".*",container!="",container!="POD",container=~".*",pod=~".+"}) by (container,pod,cluster) / sum(kube_pod_container_resource_limits{namespace=~"metrics",cluster=~".*",pod=~".+",container=~".*",resource="memory"}) by (container,pod,cluster)) by (container,pod,cluster)[1h:5m])' --data-urlencode "start=2023-12-13T06:39:53.152Z" --data-urlencode "end=2023-12-13T07:24:58.224Z" --data-urlencode "step=30s" 'http://localhost:8082/api/v1/query_range'

which includes cluster=".*" this seems to be added at the promxy level -- not an actual label attached to the downstream series. So its possible that if this label doesn't exist in VM then it will return no series -- thereby giving an empty result. As another idea for debugging some, you could see if breaking the query up (e.g. numerator separate from denominator) helps narrow it down at all.

Hopefully something of the above was helpful, and I'm more than happy to help out a bit more with some additional log output :)

Z3po commented 5 months ago

I've solved the issue and it was a classical :facepalm:....

As you can see in the issue description our configuration includes the setting remote_read: true as we've used our configuration for prometheus and just replaced the prometheus endpoints with the new victoriametrics endpoints. This works well with most of the queries until promxy tries to use the remote_read api, because it's unsupported by victoriametrics.

This error didn't show up anywhere (not even in the logs on trace level) because I set ignore_errors: true due to the multi-cluster architecture and I do not want to show errors if a single DC is down because it doesn't matter....

Do you have an Idea how I can configure promxy to show these kind of query errors but ignore errors where the DC is completely unavailable?

In regards to the issue: it is solved by changing remote_read to false.

Sorry for wasting your time! I'm going back to my rooms corner feeling ashamed....

jacksontj commented 4 months ago

Do you have an Idea how I can configure promxy to show these kind of query errors but ignore errors where the DC is completely unavailable?

Unfortunately I can't think of a great way to do so -- because from promxy's perspective I don't know how to differentiate between "this error is because the downstream doesn't support it" and "that whole endpoint is dead". Now if the client returned more specific errors we could bubble up the errors differently -- but we start getting into requiring a mapping somewhere of which errors are "real" and which aren't (which is its own complicated rabbit hole).

The only thought I have is that the ignore_error feature predates the warnings in promxy -- maybe it would be helpful to put the error in a warning instead of swallowing it completely?

I do not want to show errors if a single DC is down because it doesn't matter

I'd caution against this -- although this can be true; if you have any queries that span the 2 DCs the results are not "right" in the event of a failure (since you have half the data). Especially if you are alerting on these I'd caution against it -- there are situations where you can ignore; but there are many more where you shouldn't.

Z3po commented 4 months ago

I like your suggestion of translating errors into warnings. This way it wouldn't interfere with using promxy in grafana but there's still a hint somewhere (and not plain nothing :) ).

Thanks for caring!

jacksontj / promxy

promxy returns empty results when using <aggregate>_over_time() function wrapping multiple queries with VictoriaMetrics #633