Postgres Cluster DB restarting after each c.a 30 minutes

gruppferi commented 3 years ago

**Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.6.1
**Where do you run it - cloud or metal? Kubernetes local
**Are you running Postgres Operator in production? Yes
**Type of issue? Question

I am using the Postgres Operator Helm template 1.6.1 with two Postgres DB Cluster, Installed brand new. After each around 30 minutes the pods of postgres cluster gets restart.

No Resources issues
Pod describe doesn't give any reason why container get restarted.
Postgres Operator Log, the only issue I see is : msg="could not get connection pooler secret pooler.postgres-default.credentials.postgresql.acid.zalan.do: secrets \"pooler.postgres-default.credentials.postgresql.acid.zalan.do\" not found" cluster-name=default/postgres-default pkg=cluster worker=0
Postgres DB Pods logs looks totally fine, leader election and then only info who is leader and who is follower.

Patroni output shows cluster running fine output bellow for the other instance the same

| Member             | Host          | Role    | State   | TL | Lag in MB |
+--------------------+---------------+---------+---------+----+-----------+
| postgres-default-0 | 10.233.64.253 | Replica | running | 10 |         0 |
| postgres-default-1 | 10.233.66.225 | Leader  | running | 10 |           |
+--------------------+---------------+---------+---------+----+-----------+

DB Cluster template :

---
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: "postgres-default"
namespace: default
spec:
teamId: "POSTGRES"
volume:
size: "25Gi"
storageClass : "standard"
numberOfInstances: 2
users: {}
databases: {}
postgresql:
version: "13"
parameters:
  effective_cache_size: 2048MB # half total RAM limit
  effective_io_concurrency: "64"
  maintenance_work_mem: 512MB # 1/8 ot total RAM limit
  max_connections: "200" # max 100 per 128 MB sahred_buffers
  max_worker_processes: "8"
  shared_buffers: 1024MB # quarter of total RAM limit
  temp_buffers: 8MB
  wal_buffers: 16kB
  work_mem: 8MB

resources:
limits:
  cpu: "3"
  memory: "4Gi"
requests:
  cpu: "1"

Operator template only overwriting some of values from main template the rest will be taken from main template:
```
configKubernetes:
infrastructure_roles_secret_name: postgresql-infrastructure-roles
```

configPostgresPodResources: default_cpu_limit: "4" default_cpu_request: 100m default_memory_limit: 4Gi default_memory_request: 100Mi

resources: limits: cpu: "500m" memory: "500Mi" requests: cpu: "100m" memory: "250Mi"

crd: create: false

* And YES the Postgres DB clusters are the only pod getting started no other services running in cluster
I couldn't find unfortunately where the problem is as there is no Error log to point to issue.
PS: I used before Postgres Operator 1.3.1, as migration was not easy to do, so I dumped the data removed complete the operator and DB cluster and installed new Operator and DB. Still I didn't restore the data, so brand new without any data.

*** Update:
this is what happens before restarting of the pod:

/run/service/patroni: finished with code=0 signal=0 stopping /run/service/patroni timeout: finish: .: (pid 194) 1749s, want down ok: down: patroni: 1s, normally up ok: down: /etc/service/patroni: 1s, normally up ok: down: /etc/service/pgqd: 0s, normally up

gruppferi commented 3 years ago

Side Info, I looked over this issue 927 but I am not running any sidecars

lbogdan commented 3 years ago

Having the same issue after upgrading from 1.6.0 to 1.6.1. The relevant log messages seem to be these:

time="2021-03-12T09:54:32Z" level=info msg="SYNC event has been queued" cluster-name=namespace/cluster-name pkg=controller worker=0
time="2021-03-12T09:54:32Z" level=info msg="there are 1 clusters running" pkg=controller
time="2021-03-12T09:54:32Z" level=info msg="syncing of the cluster started" cluster-name=namespace/cluster-name pkg=controller worker=0
[...]
time="2021-03-12T09:54:33Z" level=debug msg="set statefulset's rolling update annotation to false: caller/reason from cache" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="set statefulset's rolling update annotation to true: caller/reason statefulset changes" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=info msg="statefulset namespace/cluster-name is not in the desired state and needs to be updated" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-          terminationMessagePath: /dev/termination-log," cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-          terminationMessagePolicy: File," cluster-name=namespace/namespace-db pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-      restartPolicy: Always," cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-      dnsPolicy: ClusterFirst," cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-      serviceAccount: postgres-pod," cluster-name=namespace/namespace-db pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-      schedulerName: default-scheduler," cluster-name=namespace/namespace-db pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-      kind: PersistentVolumeClaim," cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-      apiVersion: v1," cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-      status: {" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-        phase: Pending" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-      }" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="+      status: {}" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-  }," cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-  revisionHistoryLimit: 10" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="+  }" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="metadata.annotation are different" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="-  zalando-postgres-operator-rolling-update-required: false" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="+  zalando-postgres-operator-rolling-update-required: true" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=info msg="reason: new statefulset containers's postgres (index 0) security context does not match the current one" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="updating statefulset" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="patching statefulset annotations" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="patching statefulset annotations" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="calling Patroni API on a pod namespace/cluster-name-0 to set the following Postgres options: map[max_connections:300]" cluster-name=namespace/cluster-namepkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="making PATCH http request: http://10.56.7.44:8008/config" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=debug msg="performing rolling update" cluster-name=namespace/cluster-name pkg=cluster
time="2021-03-12T09:54:33Z" level=info msg="there are 2 pods in the cluster to recreate" cluster-name=namespace/cluster-name pkg=cluster
[...]

lbogdan commented 3 years ago

I think I figured out the issue, and it seems to have been introduced by https://github.com/zalando/postgres-operator/pull/1380.

With no capabilities set, currently the securityContext of the postgres container in my StatefulSet is

         securityContext:                                                                                             
           allowPrivilegeEscalation: false                                                                            
           capabilities: {}                                                                                           
           privileged: false                                                                                          
           readOnlyRootFilesystem: false

, so I guess capabilities defaults to {}. Now with #1380, generateCapabilities() was changed to return nil when there's no capabilities set, which then makes the check

        newCheck("new statefulset %s's %s (index %d) security context does not match the current one",
            func(a, b v1.Container) bool { return !reflect.DeepEqual(a.SecurityContext, b.SecurityContext) }),

fail, because capabilities is {} in the cluster, and nil in the definition generated by the operator.

lbogdan commented 3 years ago

I think this is a critical issue, as everyone starting with or upgrading to 1.6.1 will end up with all database cluster nodes being restarted every ~30m.

To confirm my previous assumption, I set additional_pod_capabilities: "SYS_NICE", and now everything is back to normal.

So current workarounds for this issue are:

set additional_pod_capabilities to have at least one capability
go back to using 1.6.0.

jamorales85 commented 3 years ago

Hello, I'm testing with Openshift and I have this error, create Pod test-db-0 in StatefulSet tedial-astdb failed error: pods "test-db-0" is forbidden: unable to validate against any security context constraint: [capabilities.add: Invalid value: "SYS_NICE": capability may not be added] Any idea?

FxKu commented 3 years ago

@jamorales85 in this case SYS_NICE is not allowed in your infrastructure. In our case it's added to a PodSecurityPolicy which we use. You could go back to v1.6.0 or use this image: v1.6.1-2-gca968ca1 which contains the fix for empty capabilities.

lbogdan commented 3 years ago

Duplicate of #1377, and #1380 actually fixes this (sorry, I was under the impression that #1380 got into 1.6.1).

@gruppferi I guess you can close this.

@FxKu It would be nice to link to the duplicated issue when adding the duplicate label.

sagor999 commented 3 years ago

Just got hit with this issue as well. I am a bit surprised that @FxKu did not roll out 1.6.2 with the fix. As 1.6.1 is pretty much broken, unless you do a workaround with adding SYS_NICE. But that might not work for all systems\environments. Also first time user experience would also not be great if they roll out postgres operator and it keeps restarting their db every 30 min. @FxKu any reason for not releasing 1.6.2 with the fix for this issue?

ghost commented 3 years ago

I'm running into this bug too !

My Kubernetes provider force me with PodSecurityPolicy to drop capabilities

  requiredDropCapabilities:
  - MKNOD

wich generate automatically securityContext in pod

    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - MKNOD
      privileged: false
      readOnlyRootFilesystem: false

Please pay attention to other field (drop), or ignore it completly.

gruppferi commented 3 years ago

@FxKu is it known when we could expect the new release to include fix for this one as well?

Kampe commented 3 years ago

Seeing the exact same issue in our clusters.

maxisme commented 3 years ago

Downgrading to registry.opensource.zalan.do/acid/postgres-operator:v1.6.0 and setting in my values-crd.yaml:

image:
  tag: v1.6.0
configKubernetes:
  additional_pod_capabilities:
    - "SYS_NICE"

I am still getting:

could not sync cluster: could not sync statefulsets: could not recreate pods: could not recreate replica pod "default/acid-foo-1": pod label wait timeout

My deploy script is:

git clone https://github.com/zalando/postgres-operator.git
helm upgrade pg ./postgres-operator/charts/postgres-operator -f values-crd.yaml --install --wait

FxKu commented 3 years ago

We have finally released the bugfix 1.6.2 release. Took a bit too long. Sorry for the inconvenience. Closing this issue now.

MatthiasLohr commented 10 months ago

Sorry for touching dead cows, but I'm experiencing something similar in v1.10.1. Any ideas?

zalando / postgres-operator

Postgres Cluster DB restarting after each c.a 30 minutes #1401