Open smatvienko-tb opened 2 years ago
How does the pgpool exporter work when there are multiple pgpool replicas in operation?
Is it a best practice to have multiple pgpool replicas in operation ?
pgpool HA works in active/passive mode with a watchdog
but k8s deployments are active/active ...
@ziouf On k8s watchdog and healthcheck are not requried, because k8s can handle HA of pgpool pods and PostgreSQL pods.
@ziouf On k8s watchdog and healthcheck are not requried, because k8s can handle HA of pgpool pods and PostgreSQL pods.
Yes, but I mean that k8s HA is active/active. So queries are balanced (round robin by default) on each replica. And I'm not sure if pgpool can handle multiple replicas at the same time without problems. I tried to deploy pgpool with 2 replicas and encountered many crashes. I assume that pgpool uses postgres tables to store internal data, and that there is a problem with table locking when multiple replicas are running at the same time.
Since I've reduced it to a single replica, everything works fine, but there is no HA.
@ziouf
I tried to deploy pgpool with 2 replicas and encountered many crashes. I assume that pgpool uses postgres tables to store internal data, and that there is a problem with table locking when multiple replicas are running at the same time.
Pgpool doesn't use PostgreSQL tables to store internal data. If it is possible, could you share the logs?
I had this problem some month ago and solved it by reducing replicas to 1. I will try to reproduce it
I reproduced the issue.
There is no logs before container restart. But events are saying that probes are in timeout. Which trigger container restart.
I guess it is the nc
call that hang until timeout because it's liveness and readiness probes that are failling.
But I don't know why. And it appears only with 2 or more replicas after some running time.
In my setup I have pgpool2_exporter as a sidecar container of pgpool replicas. And I use a ServiceMonitor to scrape it.
this is the probe script I use
#!/usr/bin/env bash
# -*- coding: utf-8 -*-
PGHOST="${PGHOST:-127.0.0.1}"
PGPORT="${PGPORT:-5432}"
function psql_check {
export PGCONNECT_TIMEOUT="${PGCONNECT_TIMEOUT:-5}"
export PGSSLMODE="${PGSSLMODE:-prefer}"
export PGPASSWORD="${POSTGRES_PASSWORD}"
export PGUSER="${POSTGRES_USERNAME}"
psql -h ${PGHOST} -p ${PGPORT} -d postgres -tA -c 'SELECT 1'
}
function telnet_check {
echo "exit" | telnet ${PGHOST} ${PGPORT} >/dev/null 2>&1
}
function nc_check {
nc -z ${PGHOST} ${PGPORT}
}
case ${1:-startup} in
# Will stop serving traffic il failed
readiness)
nc_check
;;
# Will restart pod il failed
liveness)
nc_check
;;
# Wait until service is up
startup)
psql_check
;;
# All other cases
*)
# Do nothing
;;
esac
And the deployment I use
apiVersion: apps/v1
kind: Deployment
metadata:
name: pgpool
labels:
app.kubernetes.io/name: pgpool
annotations:
reloader.stakater.com/auto: "true"
spec:
revisionHistoryLimit: 0
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: pgpool
template:
metadata:
labels:
app.kubernetes.io/name: pgpool
spec:
serviceAccountName: pgpool
restartPolicy: Always
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app.kubernetes.io/name: pgpool
containers:
- name: pgpool
image: docker.io/pgpool/pgpool:4.4.3
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
command: [/bin/bash, /scripts/start.sh]
ports:
- containerPort: 5432
name: tcp-pgpool
startupProbe:
exec:
command: [/bin/bash, /scripts/probe.sh, startup]
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 10
successThreshold: 1
# Will restart pod if failling
livenessProbe:
exec:
command: [/bin/bash, /scripts/probe.sh, liveness]
timeoutSeconds: 2
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 2
# Will stop serving traffic if failling
readinessProbe:
exec:
command: [/bin/bash, /scripts/probe.sh, readiness]
timeoutSeconds: 2
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 2
envFrom:
- prefix: PGPOOL_PARAMS_
configMapRef:
name: pgpool-env
optional: false
env:
# Use pool_passwd file generated from externalsecret
- name: PGPOOL_ENABLE_POOL_PASSWD
value: "false"
- name: PGPOOL_PASSWORD_ENCRYPTION_METHOD
value: "password"
- name: PGPOOL_SKIP_PASSWORD_ENCRYPTION
value: "true"
# Credentials
- name: POSTGRES_USERNAME
valueFrom:
secretKeyRef:
name: pgpool-pguser
key: username
optional: false
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: pgpool-pguser
key: password
optional: false
volumeMounts:
- name: pgpool-scripts
mountPath: /scripts
readOnly: true
- name: pgpool-config
mountPath: /config/pool_hba.conf
subPath: pool_hba.conf
readOnly: true
- name: pgpool-pgusers
mountPath: /opt/pgpool-II/etc/pool_passwd
subPath: pool_passwd
readOnly: true
# Prometheus exporter
- name: pgpool-exporter
image: docker.io/pgpool/pgpool2_exporter:1.2.1
resources:
requests:
cpu: 25m
memory: 16Mi
limits:
cpu: 350m
memory: 32Mi
command: [/bin/sh, /scripts/exporter.sh]
ports:
- containerPort: 9719
name: http-metrics
env:
# Credentials
- name: POSTGRES_USERNAME
valueFrom:
secretKeyRef:
name: pgpool-pguser
key: username
optional: false
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: pgpool-pguser
key: password
optional: false
volumeMounts:
- name: pgpool-scripts
mountPath: /scripts
readOnly: true
volumes:
- name: pgpool-scripts
configMap:
name: pgpool-scripts
- name: pgpool-config
configMap:
name: pgpool-config
- name: pgpool-pgusers
secret:
secretName: pgpool-pgusers
and finally the startup script for pgpool_exporter
#!/usr/bin/env bash
# -*- coding: utf-8 -*-
PGHOST="${PGHOST:-127.0.0.1}"
PGPORT="${PGPORT:-5432}"
while ! nc -z ${PGHOST} ${PGPORT}; do
echo "Waiting until pgpool is ready..."
sleep 1
done
export DATA_SOURCE_USER="${POSTGRES_USERNAME}"
export DATA_SOURCE_PASS="${POSTGRES_PASSWORD}"
export DATA_SOURCE_URI="${PGHOST}:${PGPORT}/postgres?sslmode=require"
exec /bin/pgpool2_exporter
It seem to be more stable when I use the following code as liveness/readiness probe
nc -z -w 0 ${PGHOST} ${PGPORT}
My tests have shown that nc
hangs from time to time, and using the -w 0
flag prevents this behavior. With this flag, we ask nc
to return immediately.
I use nc
to test whether the service is up and running without consuming a database connection, unlike the psql
command.
It looks like the faulty code was on my side.
Hi everyone! Check out the Kubernetes example for pgpool2-exporter. It works for my ThingsBoard cluster just fine. If you installed high availability Postgresql with Pgpool using Bitnami Helm chart, you probably already have secrets deployed. Please, replace secrets with your path. Special thanks to @pengbo0328 for
pgpool2-exporter
toolYou probably can try to reduce the resources because CPU and memory consumption is tiny.