After installing kestra in independent services mode (not standalone), messages like
com.zaxxer.hikari.pool.PoolBase HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@48cc92a0 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
After this, the number of idle processes in postgres increases and when it reaches 97, the modules cannot connect to the database because:
FATAL: remaining connection slots are reserved for roles with the SUPERUSER attribute
Then the postgres modules are restarted, the kestra services are restarted and the cycle repeats, approximately once an hour.
Modules are rebooted by probes:
Liveness probe failed: Get "http://10.233.101.109:8081/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Readiness probe failed: Get "http://10.233.101.109:8081/health": read tcp 10.0.10.6:35394->10.233.101.109:8081: read: connection reset by peer
For installation I use
sudo helm install kestra kestra/kestra -f values.yaml -n xxx
What did I do?
I tried setting postgres variables POSTGRESQL_IDLE_IN_TRANSACTION_SESSION_TIMEOUT (5 min) and POSTGRESQL_TCP_KEEPALIVES_IDLE (60) - this prevents connections from accumulating, but the error still occurs and modules (except databases) are restarted.
Added arguments for hikari, but it didn't give any results:
I am attaching the module logs and the idle processes graph from postgres exporter. I have also attached the pg_stat_activity query at the time when there were 94 idle processes in the database.
Operating System (OS/Docker/Kubernetes): 1.23.10 and 1.26.15
HW: I am using a 3 node k8s cluster:
Master 4s CPU 8 RAM 100 GB
2x workers 8 CPU 24 RAM 100 GB
Only kestra deployed in the cluster, no resource limit set, 20 GB disk space allocated for postgres and minio
Describe the issue
After installing kestra in independent services mode (not standalone), messages like
com.zaxxer.hikari.pool.PoolBase HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@48cc92a0 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
After this, the number of idle processes in postgres increases and when it reaches 97, the modules cannot connect to the database because:
FATAL: remaining connection slots are reserved for roles with the SUPERUSER attribute
Then the postgres modules are restarted, the kestra services are restarted and the cycle repeats, approximately once an hour. Modules are rebooted by probes:
For installation I use
sudo helm install kestra kestra/kestra -f values.yaml -n xxx
What did I do? I tried setting postgres variables POSTGRESQL_IDLE_IN_TRANSACTION_SESSION_TIMEOUT (5 min) and POSTGRESQL_TCP_KEEPALIVES_IDLE (60) - this prevents connections from accumulating, but the error still occurs and modules (except databases) are restarted.
Added arguments for hikari, but it didn't give any results:
I am attaching the module logs and the idle processes graph from postgres exporter. I have also attached the pg_stat_activity query at the time when there were 94 idle processes in the database.
values.yaml.txt pg_stat_activity.txt postgresql_logs.txt worker-docker-dind_logs.txt workers_log.txt webserver_logs.txt scheduler_logs.txt minio_logs.txt executor_logs.txt
Environment