jacksontj / promxy

An aggregating proxy to enable HA prometheus
MIT License
1.12k stars 126 forks source link

promxy makes up datapoints by repeating the last dataset #606

Closed Z3po closed 10 months ago

Z3po commented 11 months ago

We recognized that promxy repeats the last datapoint when doing simple queries like "max()" on a set of clusters.

Our setup looks like the following:

flowchart TD

  Promxy --> Cluster1
  Promxy --> Cluster2
  Promxy --> Cluster3

  subgraph Cluster1
    Prometheus1-1
    Prometheus1-2
  end

  subgraph Cluster2
    Prometheus2-1 
    Proemtehus2-2
  end

  subgraph Cluster3
    Prometheus3-1
    Prometheus3-2
  end

While promxy accesses each of the clusters it does only connect to a single prometheus instance through a http proxy configuration. Each k8s cluster just contains data from it's own running pods.

The configuration looks like the following:

---
global:
  evaluation_interval: 5s
  external_labels:
    source: promxy
promxy:
  server_groups:
    - static_configs:
        - targets:
            - prometheus-cluster-1:443
      labels:
        cluster: cluster1
      remote_read: true
      remote_read_path: /api/v1/read
      query_params:
        nocache: 1
      scheme: https
      http_client:
        dial_timeout: 4s
        tls_config:
          insecure_skip_verify: true
          cert_file: /etc/certs/client.crt
          key_file: /etc/certs/client.key
      ignore_error: true
    - static_configs:
        - targets:
            - prometheus-cluster-2:443
      labels:
        cluster: cluster2
      remote_read: true
      remote_read_path: /api/v1/read
      query_params:
        nocache: 1
      scheme: https
      http_client:
        dial_timeout: 4s
        tls_config:
          insecure_skip_verify: true
          cert_file: /etc/certs/client.crt
          key_file: /etc/certs/client.key
      ignore_error: true
    - static_configs:
        - targets:
            - prometheus-cluster3:443
      labels:
        cluster: cluster3
      remote_read: true
      remote_read_path: /api/v1/read
      query_params:
        nocache: 1
      scheme: https
      http_client:
        dial_timeout: 4s
        tls_config:
          insecure_skip_verify: true
          cert_file: /etc/certs/client.crt
          key_file: /etc/certs/client.key
      ignore_error: true
    - static_configs:
        - targets:
            - prometheus-cluster4:443
      labels:
        cluster: cluster4
      remote_read: true
      remote_read_path: /api/v1/read
      query_params:
        nocache: 1
      scheme: https
      http_client:
        dial_timeout: 4s
        tls_config:
          insecure_skip_verify: true
          cert_file: /etc/certs/client.crt
          key_file: /etc/certs/client.key
      ignore_error: true
      ...

Now querying for data with to promxy or the prometheus instances directly through port-forwarding:

curl -v --get --data-urlencode 'query=max(container_memory_usage_bytes{id!="/",namespace=~"XXXX",container!="",container="XXXX",container!="POD",pod=~".*XXXXX"}) by (pod, container, cluster)' --data-urlencode "start=2023-07-14T06:39:53.152Z" --data-urlencode "end=2023-07-14T07:24:58.224Z" --data-urlencode "step=30s" 'http://localhost:9090/api/v1/query_range'

returns the following data through promxy:

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "cluster": "cluster1",
          "container": "XXXX",
          "pod": "XXXX"
        },
        "values": [
          [
            1689317393.152,
            "1536000"
          ],
          [
            1689317423.152,
            "342065152"
          ],
          [
            1689317453.152,
            "1119674368"
          ],
          [
            1689317483.152,
            "1345925120"
          ],
          [
            1689317513.152,
            "1498701824"
          ],
          [
            1689317543.152,
            "2520608768"
          ],
          [
            1689317573.152,
            "2830626816"
          ],
          [
            1689317603.152,
            "2830626816"
          ],
          [
            1689317633.152,
            "2830626816"
          ],
          [
            1689317663.152,
            "2830626816"
          ],
          [
            1689317693.152,
            "2830626816"
          ],
          [
            1689317723.152,
            "2830626816"
          ],
          [
            1689317753.152,
            "2830626816"
          ],
          [
            1689317783.152,
            "2830626816"
          ],
          [
            1689317813.152,
            "2830626816"
          ],
          [
            1689317843.152,
            "2830626816"
          ],
          [
            1689317873.152,
            "2830626816"
          ]
        ]
      }
    ]
  }
}

But through prometheus1-1:

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "container": "XXXX",
          "pod": "XXXX"
        },
        "values": [
          [
            1689317393.152,
            "1536000"
          ],
          [
            1689317423.152,
            "342065152"
          ],
          [
            1689317453.152,
            "1119674368"
          ],
          [
            1689317483.152,
            "1345925120"
          ],
          [
            1689317513.152,
            "1498701824"
          ],
          [
            1689317543.152,
            "2520608768"
          ],
          [
            1689317573.152,
            "2830626816"
          ]
        ]
      }
    ]
  }
}

and through prometheus1-2:

{
  "status": "success",
  "data": {
    "resultType": "matrix",
    "result": [
      {
        "metric": {
          "container": "XXXX",
          "pod": "XXXX"
        },
        "values": [
          [
            1689317393.152,
            "92098560"
          ],
          [
            1689317423.152,
            "659861504"
          ],
          [
            1689317453.152,
            "1243791360"
          ],
          [
            1689317483.152,
            "1345925120"
          ],
          [
            1689317513.152,
            "2733039616"
          ],
          [
            1689317543.152,
            "2545987584"
          ]
        ]
      }
    ]
  }
}

As you can see there's additional datapoints in the promxy output which shouldn't be there. Am I holding it somehow wrong?

If I'm missing any data necessary to debug the issue, just let me know.

Thanks for your help in advance!

jacksontj commented 11 months ago

I did some initial looking and am unable to reproduce this issue locally.

From a quick look the results are confusing to me for both promxy and prometheus. The query is a range of ~45m with a step of 30s; which should be 90 datapoints. Promxy was returning 17 and prometheus was 6 or 7. So its a bit hard to say without the full data that promxy was seeing (either trace logs or a tcpdump). But my guess here is that the max isn't unique -- such that its putting multiple series together, but thats a really off-the wall guess without more data.

Z3po commented 11 months ago

Thanks for taking a first look!

I've run the queries with a trace loglevel and got the following (now another pod but still the same issue:

{"level":"debug","msg":"AggregateExpr max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"}) max","time":"2023-08-08T11:28:43Z"}

{"api":"QueryRange","level":"debug","msg":"http://Cluster8:9090","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z"}
{"api":"QueryRange","level":"debug","msg":"https://Cluster6:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z"}
{"api":"QueryRange","level":"debug","msg":"https://Cluster2:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z"}
{"api":"QueryRange","level":"debug","msg":"https://Cluster5:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z"}
{"api":"QueryRange","level":"debug","msg":"https://Cluster4:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z"}
{"api":"QueryRange","level":"debug","msg":"https://Cluster1:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z"}
{"api":"QueryRange","level":"debug","msg":"https://Cluster7:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z"}
{"api":"QueryRange","level":"debug","msg":"https://Cluster3:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z"}

{"api":"QueryRange","error":null,"level":"trace","msg":"https://Cluster6:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z","took":5202879,"value":[],"warnings":null}
{"api":"QueryRange","error":null,"level":"trace","msg":"http://Cluster8:9090","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z","took":6089343,"value":[],"warnings":null}
{"api":"QueryRange","error":null,"level":"trace","msg":"https://Cluster5:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z","took":11511931,"value":[{"metric":{"container":"build","pod":"XXXX"},"values":[[1691487358,"1765376"],[1691487388,"725573632"]]}],"warnings":null}
{"api":"QueryRange","error":null,"level":"trace","msg":"https://Cluster7:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z","took":18630738,"value":[],"warnings":null}
{"api":"QueryRange","error":null,"level":"trace","msg":"https://Cluster2:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z","took":24686659,"value":[],"warnings":null}
{"api":"QueryRange","error":null,"level":"trace","msg":"https://Cluster4:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z","took":25874023,"value":[],"warnings":null}
{"api":"QueryRange","error":null,"level":"trace","msg":"https://Cluster1:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z","took":26241856,"value":[],"warnings":null}
{"api":"QueryRange","error":null,"level":"trace","msg":"https://Cluster3:443","query":"max by(pod, container) (container_memory_usage_bytes{container!=\"\",container!=\"POD\",container=\"build\",id!=\"/\",namespace=~\"XXXXXXX\",pod=~\"XXXX\"})","r":{"Start":"2023-08-08T09:29:58Z","End":"2023-08-08T09:40:04Z","Step":30000000000},"time":"2023-08-08T11:28:43Z","took":25840574,"value":[],"warnings":null}

{"remoteAddr":"127.0.0.1","time":"2023-08-08T11:28:43.732172484Z","method":"GET","path":"/api/v1/query_range","protocol":"HTTP/1.1","status":200,"responseBytes":383,"duration":0.030032405,"query":"start=2023-08-08T09%3A29%3A58Z\u0026end=2023-08-08T09%3A40%3A04Z\u0026step=30s\u0026query=max%28container_memory_usage_bytes%7Bid%21%3D%22%2F%22%2Cnamespace%3D~%22XXXXXXX%22%2Ccontainer%21%3D%22%22%2Ccontainer%3D%22build%22%2Ccontainer%21%3D%22POD%22%2Cpod%3D~%22XXXX"}

And the result is

❯ curl -v --get --data-urlencode 'query=max(container_memory_usage_bytes{id!="/",namespace=~"XXXXXXX",container!="",container="build",container!="POD",pod=~"XXXX"}) by (pod, container)' --data-urlencode "start=2023-08-08T09:29:58Z" --data-urlencode end="2023-08-08T09:40:04Z" --data-urlencode "step=30s" http://localhost:8082/api/v1/query_range
*   Trying 127.0.0.1:8082...
* Connected to localhost (127.0.0.1) port 8082 (#0)
> GET /api/v1/query_range?query=max%28container_memory_usage_bytes%7bid%21%3d%22%2f%22%2cnamespace%3d~%22XXXXXXX%22%2ccontainer%21%3d%22%22%2ccontainer%3d%22build%22%2ccontainer%21%3d%22POD%22%2cpod%3d~%22XXXX%22%7d%29+by+%28pod%2c+container%29&start=2023-08-08T09%3a29%3a58Z&end=2023-08-08T09%3a40%3a04Z&step=30s HTTP/1.1
> Host: localhost:8082
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Tue, 08 Aug 2023 11:28:43 GMT
< Content-Length: 383
< 
* Connection #0 to host localhost left intact
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"container":"build","pod":"XXXX"},"values":[[1691487358,"1765376"],[1691487388,"725573632"],[1691487418,"725573632"],[1691487448,"725573632"],[1691487478,"725573632"],[1691487508,"725573632"],[1691487538,"725573632"],[1691487568,"725573632"],[1691487598,"725573632"]]}]}}

Looks to me like it's not receiving data points from any other cluster on that query....

And BTW we're running quay.io/jacksontj/promxy:v0.0.81.

Z3po commented 11 months ago

Sorry for the comment-update-mess. I decided it's better to run the curl command again and fetch the necessary information from the very same execution. The logs should definitely match the query.

jacksontj commented 10 months ago

Thanks for the report here! With all the data this makes it trivial to reproduce which really speeds things up. So looking at this it seems that this is just another case of lookback-delta mischief. The query lookback delta from the upstream docs:

The maximum lookback duration for retrieving metrics during expression evaluations and federation. Use with server mode only.

For promxy it does the same-- the difference being that the dataset it is applying it against (usually) has already done this. So in this case, the default lookback delta is 5m -- so what happened is promql in promxy decided that the last point it saw was okay to repeat because it was the last point it saw. For other selectors (e.g. VectorSelector) this has already been fixed; but it seems that the aggregate side had this issue still. I have opened up #609 to resolve that issue.