thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.06k stars 2.09k forks source link

Thanos receive distributor HPA not working - Traffic is only going to 1 distributor #7577

Closed yadavnikhil closed 2 months ago

yadavnikhil commented 2 months ago

Thanos, Prometheus and Golang version used: Thanos: v0.35.1 Prometheus v2.50.1

Object Storage Provider: AWS S3 Object Storage

What happened: We are running thanos receive in distributor mode (ingesting and routing) with remote write from Prometheus. We have enabled HPA for both. We are using thanos receive controller for dynamically updating the hashrings HPA works fine for ingestion pods statefulsets, but distributor pods are not scaling correctly. In monitoring, the usage is only in 1 pod, metrics too are generated for that active pod for http_requests_total. If loads increases, 1 particular distributor node starts crashing OOM with Context Deadline Exceeded error in Prometheus.

monitoring-thanos-receive-distributor-844954b5f5-9sd99            1m           19Mi
monitoring-thanos-receive-distributor-844954b5f5-lsd9l            4083m        9617Mi
monitoring-thanos-receive-distributor-844954b5f5-s84m8            1m           20Mi

Our Receiver config:

    - receive
    - --log.level=debug
    - --log.format=json
    - --grpc-address=0.0.0.0:10901
    - --http-address=0.0.0.0:10902
    - --remote-write.address=0.0.0.0:19291
    - --objstore.config=$(OBJSTORE_CONFIG)
    - --tsdb.path=/var/thanos/receive
    - --label=replica="$(NAME)"
    - --label=receive="true"
    - --receive.local-endpoint=$(NAME).thanos-receive-headless.$(NAMESPACE).svc.cluster.local:10901
    - --grpc-server-tls-cert=/certs/tls.crt
    - --grpc-server-tls-key=/certs/tls.key
    - --grpc-server-tls-client-ca=/certs/ca.crt
    - --receive.hashrings-algorithm=ketama
    - --tsdb.retention=1d

Receiver distributor config:

    - receive
    - --log.level=debug
    - --log.format=json
    - --grpc-address=0.0.0.0:10901
    - --http-address=0.0.0.0:10902
    - --remote-write.address=0.0.0.0:19291
    - --label=replica="$(NAME)"
    - --label=receive="true"
    - --receive.hashrings-file=/var/lib/thanos-receive/hashrings.json
    - --receive.replication-factor=2
    - --receive.grpc-compression=snappy
    - --receive-forward-timeout=30s
    - --grpc-server-tls-cert=/certs/tls.crt
    - --grpc-server-tls-key=/certs/tls.key
    - --grpc-server-tls-client-ca=/certs/ca.crt
    - --remote-write.server-tls-cert=/certs/tls.crt
    - --remote-write.server-tls-key=/certs/tls.key
    - --remote-write.server-tls-client-ca=/certs/ca.crt
    - --remote-write.client-tls-ca=/client/ca.crt
    - --remote-write.client-tls-cert=/client/tls.crt
    - --remote-write.client-tls-key=/client/tls.key
    - --remote-write.client-tls-secure
    - --receive.hashrings-algorithm=ketama

Hashring with 10 receive generated by thanos receive controller in auto-scaling

hashrings.json: '[{"hashring":"default","endpoints":[{"address":"monitoring-thanos-receive-0.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-1.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-2.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-3.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-4.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-5.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-6.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-7.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-8.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""},{"address":"monitoring-thanos-receive-9.monitoring-thanos-receive-headless.monitoring.svc.cluster.local:10901","az":""}]}]'

Prometheus Operator RemoteWrite config:

  remoteWrite:
  - queueConfig:
      batchSendDeadline: 10s
      capacity: 20000
      maxBackoff: 10s
      maxSamplesPerSend: 4000
      maxShards: 500
      minBackoff: 100ms
      minShards: 10
      retryOnRateLimit: true
    tlsConfig:
      ca:
        secret:
          key: ca.crt
          name: monitoring-thanos-receive-client
      cert:
        secret:
          key: tls.crt
          name: monitoring-thanos-receive-client
      keySecret:
        key: tls.key
        name: monitoring-thanos-receive-client
    url: https://monitoring-thanos-receive.monitoring.svc.cluster.local:19291/api/v1/receive

Thanos receive distributor service definition (sessionAffinity: None):

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: receive
    app.kubernetes.io/version: 0.34.0
    prometheus-operator/monitor: "true"
  name: monitoring-thanos-receive
spec:
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 10902
    protocol: TCP
    targetPort: http
  - name: grpc
    port: 10901
    protocol: TCP
    targetPort: grpc
  - name: remote
    port: 19291
    protocol: TCP
    targetPort: remote-write
  selector:
    app.kubernetes.io/component: receive-distributor
    app.kubernetes.io/instance: monitoring-thanos
    app.kubernetes.io/name: monitoring-thanos
  sessionAffinity: None
  type: ClusterIP

Only 1 pod is showing usage for top pod monitor.

Is it anything wrong in configuration in Thanos or Prometheus side?

What you expected to happen: The traffic load should be distributed on all thanos distributor pods and they should auto-scale when resource usage crosses a limit in HPA

How to reproduce it (as minimally and precisely as possible): Deploy receiver in distributed mode and monitor resource usage and requests for receive distributor pods.

Full logs to relevant components:

Anything else we need to know:

Environment:

yadavnikhil commented 2 months ago

Looks like this is from Prometheus side.