pgpool / pgpool2_exporter

Prometheus exporter for Pgpool-II.
MIT License
56 stars 28 forks source link

Pgpool2-exporter and Kubernetes #19

Open smatvienko-tb opened 2 years ago

smatvienko-tb commented 2 years ago

Hi everyone! Check out the Kubernetes example for pgpool2-exporter. It works for my ThingsBoard cluster just fine. If you installed high availability Postgresql with Pgpool using Bitnami Helm chart, you probably already have secrets deployed. Please, replace secrets with your path. Special thanks to @pengbo0328 for pgpool2-exporter tool

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: pgpool2-exporter
spec:
  serviceName: pgpool2-exporter
  replicas: 1
  podManagementPolicy: Parallel
  selector:
    matchLabels:
      app: pgpool2-exporter
  template:
    metadata:
      annotations:
        prometheus.io/path: '/metrics'
        prometheus.io/port: '9719'
        prometheus.io/scrape: 'true'
      labels:
        app: pgpool2-exporter
    spec:
      containers:
        - name: pgpool2-exporter
          imagePullPolicy: Always
          image: pgpool/pgpool2_exporter:latest
          resources:
            requests:
              cpu: 200m
              memory: 200Mi
            limits:
              cpu: 200m
              memory: 200Mi
          ports:
            - containerPort: 9719
              name: metrics
          env:
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POSTGRES_USERNAME
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: postgresql-username
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: postgresql-password
            - name: POSTGRES_DATABASE
              value: "thingsboard"
            - name: PGPOOL_SERVICE
              value: "postgresql-ha-pgpool"
            - name: PGPOOL_SERVICE_PORT
              value: "5432"
            - name: SSLMODE
              value: "disable"
      restartPolicy: Always

You probably can try to reduce the resources because CPU and memory consumption is tiny. image

mar-rih commented 1 year ago

How does the pgpool exporter work when there are multiple pgpool replicas in operation?

ziouf commented 1 year ago

Is it a best practice to have multiple pgpool replicas in operation ?

pgpool HA works in active/passive mode with a watchdog

but k8s deployments are active/active ...

pengbo0328 commented 1 year ago

@ziouf On k8s watchdog and healthcheck are not requried, because k8s can handle HA of pgpool pods and PostgreSQL pods.

ziouf commented 1 year ago

@ziouf On k8s watchdog and healthcheck are not requried, because k8s can handle HA of pgpool pods and PostgreSQL pods.

Yes, but I mean that k8s HA is active/active. So queries are balanced (round robin by default) on each replica. And I'm not sure if pgpool can handle multiple replicas at the same time without problems. I tried to deploy pgpool with 2 replicas and encountered many crashes. I assume that pgpool uses postgres tables to store internal data, and that there is a problem with table locking when multiple replicas are running at the same time.

Since I've reduced it to a single replica, everything works fine, but there is no HA.

pengbo0328 commented 1 year ago

@ziouf

I tried to deploy pgpool with 2 replicas and encountered many crashes. I assume that pgpool uses postgres tables to store internal data, and that there is a problem with table locking when multiple replicas are running at the same time.

Pgpool doesn't use PostgreSQL tables to store internal data. If it is possible, could you share the logs?

ziouf commented 1 year ago

I had this problem some month ago and solved it by reducing replicas to 1. I will try to reproduce it

ziouf commented 1 year ago

I reproduced the issue.

There is no logs before container restart. But events are saying that probes are in timeout. Which trigger container restart.

I guess it is the nc call that hang until timeout because it's liveness and readiness probes that are failling. But I don't know why. And it appears only with 2 or more replicas after some running time.

In my setup I have pgpool2_exporter as a sidecar container of pgpool replicas. And I use a ServiceMonitor to scrape it.

this is the probe script I use

#!/usr/bin/env bash
# -*- coding: utf-8 -*-

PGHOST="${PGHOST:-127.0.0.1}"
PGPORT="${PGPORT:-5432}"

function psql_check {
    export PGCONNECT_TIMEOUT="${PGCONNECT_TIMEOUT:-5}"
    export PGSSLMODE="${PGSSLMODE:-prefer}" 
    export PGPASSWORD="${POSTGRES_PASSWORD}" 
    export PGUSER="${POSTGRES_USERNAME}" 

    psql -h ${PGHOST} -p ${PGPORT} -d postgres -tA -c 'SELECT 1'
}

function telnet_check {
    echo "exit" | telnet ${PGHOST} ${PGPORT} >/dev/null 2>&1
}

function nc_check {
    nc -z ${PGHOST} ${PGPORT}
}

case ${1:-startup} in 
    # Will stop serving traffic il failed
    readiness)
        nc_check
    ;;
    # Will restart pod il failed
    liveness)
        nc_check
    ;;
    # Wait until service is up
    startup)
        psql_check
    ;;
    # All other cases
    *)
        # Do nothing
    ;;
esac

And the deployment I use

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgpool
  labels:
    app.kubernetes.io/name: pgpool
  annotations:
    reloader.stakater.com/auto: "true"

spec:
  revisionHistoryLimit: 0

  replicas: 2

  selector:
    matchLabels:
      app.kubernetes.io/name: pgpool

  template:
    metadata:
      labels:
        app.kubernetes.io/name: pgpool

    spec:
      serviceAccountName: pgpool

      restartPolicy: Always

      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/name: pgpool

      containers:
        - name: pgpool
          image: docker.io/pgpool/pgpool:4.4.3

          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi

          command: [/bin/bash, /scripts/start.sh]

          ports:
            - containerPort: 5432
              name: tcp-pgpool

          startupProbe:
            exec: 
              command: [/bin/bash, /scripts/probe.sh, startup]
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 10
            successThreshold: 1

          # Will restart pod if failling
          livenessProbe:
            exec: 
              command: [/bin/bash, /scripts/probe.sh, liveness]
            timeoutSeconds: 2
            initialDelaySeconds: 5
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 2

          # Will stop serving traffic if failling
          readinessProbe:
            exec: 
              command: [/bin/bash, /scripts/probe.sh, readiness]
            timeoutSeconds: 2
            initialDelaySeconds: 5
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 2

          envFrom:
            - prefix: PGPOOL_PARAMS_
              configMapRef:
                name: pgpool-env
                optional: false
          env:
            # Use pool_passwd file generated from externalsecret
            - name: PGPOOL_ENABLE_POOL_PASSWD
              value: "false"
            - name: PGPOOL_PASSWORD_ENCRYPTION_METHOD
              value: "password"
            - name: PGPOOL_SKIP_PASSWORD_ENCRYPTION
              value: "true"
            # Credentials
            - name: POSTGRES_USERNAME
              valueFrom:
                secretKeyRef:
                  name: pgpool-pguser
                  key: username
                  optional: false
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: pgpool-pguser
                  key: password
                  optional: false

          volumeMounts:
            - name: pgpool-scripts
              mountPath: /scripts
              readOnly: true
            - name: pgpool-config
              mountPath: /config/pool_hba.conf
              subPath: pool_hba.conf
              readOnly: true
            - name: pgpool-pgusers
              mountPath: /opt/pgpool-II/etc/pool_passwd
              subPath: pool_passwd
              readOnly: true

        # Prometheus exporter
        - name: pgpool-exporter
          image: docker.io/pgpool/pgpool2_exporter:1.2.1

          resources:
            requests:
              cpu: 25m
              memory: 16Mi
            limits:
              cpu: 350m
              memory: 32Mi

          command: [/bin/sh, /scripts/exporter.sh]

          ports:
            - containerPort: 9719
              name: http-metrics

          env:
            # Credentials
            - name: POSTGRES_USERNAME
              valueFrom:
                secretKeyRef:
                  name: pgpool-pguser
                  key: username
                  optional: false
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: pgpool-pguser
                  key: password
                  optional: false

          volumeMounts:
            - name: pgpool-scripts
              mountPath: /scripts
              readOnly: true

      volumes:
        - name: pgpool-scripts
          configMap:
            name: pgpool-scripts
        - name: pgpool-config
          configMap:
            name: pgpool-config
        - name: pgpool-pgusers
          secret:
            secretName: pgpool-pgusers

and finally the startup script for pgpool_exporter

#!/usr/bin/env bash
# -*- coding: utf-8 -*-

PGHOST="${PGHOST:-127.0.0.1}"
PGPORT="${PGPORT:-5432}"

while ! nc -z ${PGHOST} ${PGPORT}; do 
    echo "Waiting until pgpool is ready..."
    sleep 1
done

export DATA_SOURCE_USER="${POSTGRES_USERNAME}"
export DATA_SOURCE_PASS="${POSTGRES_PASSWORD}"
export DATA_SOURCE_URI="${PGHOST}:${PGPORT}/postgres?sslmode=require"

exec /bin/pgpool2_exporter
ziouf commented 1 year ago

It seem to be more stable when I use the following code as liveness/readiness probe

nc -z -w 0 ${PGHOST} ${PGPORT}

My tests have shown that nc hangs from time to time, and using the -w 0 flag prevents this behavior. With this flag, we ask nc to return immediately. I use nc to test whether the service is up and running without consuming a database connection, unlike the psql command.

It looks like the faulty code was on my side.