Closed neelasha-09 closed 3 years ago
List of processes in the container might help to figure it out.
Please run ps auxwf
and copy the output here.
Please find the output requested.
postgres@postgres-operator-cluster-1-0:~$ ps auxwf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 1390 0.3 0.0 4640 840 pts/0 Ss 07:30 0:00 /bin/sh -c TERM="xterm" /bin/sh
postgres 1397 0.0 0.0 4640 840 pts/0 S 07:30 0:00 \_ /bin/sh
postgres 1399 0.0 0.0 22044 4164 pts/0 S 07:30 0:00 \_ bash
postgres 1417 0.0 0.0 37812 3336 pts/0 R+ 07:30 0:00 \_ ps auxwf
postgres 1 0.0 0.0 4396 824 ? Ss 07:12 0:00 /usr/bin/dumb-init -c --rewrite 1:0 -- /bin/sh /launch.sh
postgres 10 0.0 0.0 4640 1752 ? S 07:12 0:00 /bin/sh /launch.sh
postgres 32 0.0 0.0 4564 740 ? S 07:12 0:00 \_ /usr/bin/runsvdir -P /etc/service
postgres 33 0.1 0.0 4412 1288 ? Ss 07:12 0:01 \_ runsv cron
postgres 34 0.0 0.0 4412 800 ? Ss 07:12 0:00 \_ runsv pgqd
postgres 38 0.0 0.0 108012 8136 ? S 07:12 0:00 | \_ /usr/bin/pgqd /home/postgres/pgq_ticker.ini
postgres 35 0.0 0.0 4412 856 ? Ss 07:12 0:00 \_ runsv patroni
postgres 37 0.2 0.1 620216 38664 ? Sl 07:12 0:02 \_ /usr/bin/python3 /usr/local/bin/patroni /home/postgres/postgres.yml
postgres 73 0.0 0.0 320592 30636 ? S 07:13 0:00 /usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data --config-file=/home/postgres/pgdata/pgroot/data/p
postgres 75 0.0 0.0 200256 4676 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: logger
postgres 78 0.3 0.0 422180 25376 ? Ssl 07:13 0:04 \_ postgres: postgres-operator-cluster-1: bg_mon
postgres 83 0.0 0.0 320688 15888 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: checkpointer
postgres 84 0.0 0.0 320576 6748 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: background writer
postgres 85 0.1 0.0 202712 5468 ? Ss 07:13 0:01 \_ postgres: postgres-operator-cluster-1: stats collector
postgres 87 0.0 0.0 321792 16980 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: postgres postgres [local] idle
postgres 103 0.0 0.0 320576 9016 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: walwriter
postgres 104 0.0 0.0 321264 8680 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: autovacuum launcher
postgres 105 0.0 0.0 202456 4720 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: archiver last was 000000270000000000000028.partial
postgres 106 0.0 0.0 321632 14500 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: pg_cron launcher
postgres 107 0.0 0.0 321116 8700 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: TimescaleDB Background Worker Launcher
postgres 108 0.0 0.0 321116 7196 ? Ss 07:13 0:00 \_ postgres: postgres-operator-cluster-1: logical replication launcher
It fails to start /usr/sbin/cron
, therefore backups are effectively broken.
What is the solution?
It seems that OCP starts the container with nosuid, therefore cron can't be started as root. Unfortunately it can't work as non-root user. The solution would be finding an alternative to the cron, that doesn't require a root to work.
What is the impact of missing these backups ? If HA and minor version upgrade is working fine
Would running in privileged mode have effect? ( Not preferred to run in such mode, but would it affect. )
What is the impact of missing these backups ?
Well, I don't really know how to answer such questions... RAID is not a backup. HA is not a backup. Running replicas are not replacing the backup. Backup stored on the same machine is not a backup. Backup stored in the same DC is not a good backup. The backup that was never tested is a Schrödinger backup. And so on...
It seems to be AllowPrivilegeEscalation
parameter in OPR 1.6
causing the problem.
We identified, the certificate rotation would be affected by not having cron
running.
We agree in the long run ideally we should move from cron
dependency.
Hi Team,
We are using the new PG version 1.6. Below are the images used for OPR and Cluster on OCP environment.
We see the Cluster is up and running fine, but there are still errors in cluster logs
seteuid: Operation not permitted
. Logs attached.Logs.txt
Permissions inside cluster:
Could you please support?