zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.22k stars 968 forks source link

Database master pod not start after node with pod fail #683

Closed ssimk0 closed 4 years ago

ssimk0 commented 4 years ago

Hi, I have a problem with pod after fail node on which master pod run, I configure operator to reboot quicker when fail node when is the operator container, but still don't help with db pod. operator config:

  api_port: "8080"
  aws_region: eu-central-1
  cluster_domain: cluster.local
  cluster_history_entries: "1000"
  cluster_labels: application:spilo
  cluster_name_label: version
  db_hosted_zone: db.example.com
  debug_logging: "true"
  docker_image: registry.opensource.zalan.do/acid/spilo-11:1.5-p9
  enable_master_load_balancer: "true"
  enable_replica_load_balancer: "false"
  enable_teams_api: "false"
  master_dns_name_format: '{cluster}.{team}.staging.{hostedzone}'
  pdb_name_format: postgres-{cluster}-pdb
  pod_deletion_wait_timeout: 2m
  pod_label_wait_timeout: 1m
  pod_management_policy: ordered_ready
  pod_role_label: spilo-role
  pod_service_account_name: zalando-postgres-operator
  pod_terminate_grace_period: 1m
  master_pod_move_timeout: 3m
  ready_wait_interval: 3s
  ready_wait_timeout: 30s
  repair_period: 5m
  replica_dns_name_format: '{cluster}-repl.{team}.staging.{hostedzone}'
  replication_username: standby
  resource_check_interval: 3s
  resource_check_timeout: 1m
  resync_period: 5m
  ring_log_lines: "100"
  secret_name_template: '{username}.{cluster}.credentials'
  spilo_privileged: "false"
  super_username: postgres
  watched_namespace: '*'
  workers: "4"

postgresql manifest:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: teamId-pg96
spec:
  enableMasterLoadBalancer: true
  enableReplicaLoadBalancer: false
  teamId: "teamId"
  volume:
    size: 15Gi
  numberOfInstances: 2
  metrics:
    enabled: false
  users:
    user:
    - createdb

    # role for application foo
    foo_user: []
  patroni:
    pg_hba:
    - local   all             all                                   trust
    - hostssl all             +zalandos    127.0.0.1/32       pam
    - host    all             all                127.0.0.1/32       md5
    - hostssl all             +zalandos    ::1/128            pam
    - host    all             all                ::1/128            md5
    - hostssl replication     standby all                md5
    - hostnossl all           all                all                md5
    - hostssl all             +zalandos    all                pam
    - hostssl all             all                all                md5
  #databases: name->owner
  databases:
    main: user

  postgresql:
    version: "9.6"

after 20 minutes on failure:

NAME                                 READY   STATUS        RESTARTS   AGE
postgres-operator-76f5c5cb58-7g42w   1/1     Terminating   0          67m
postgres-operator-76f5c5cb58-ws47s   1/1     Running       0          26m
user-pg96-0                         1/1     Terminating   0          48m
Name:                      user-pg96-0
Namespace:                 db
Priority:                  0
Node:                      node1/10.4.18.118
Start Time:                Thu, 10 Oct 2019 10:24:50 +0200
Labels:                    application=spilo
                           controller-revision-hash=user-pg96-bf85f74b
                           spilo-role=master
                           statefulset.kubernetes.io/pod-name=tupa1-pg96-0
                           team=tupa1
                           version=user-pg96
Annotations:               kubernetes.io/psp: 00-pharos-privileged
                           status:
                             {"conn_url":"postgres://10.46.0.9:5432/postgres","api_url":"http://10.46.0.9:8008/patroni","state":"running","role":"master","version":"1....
Status:                    Terminating (lasts 22m)
Termination Grace Period:  300s
IP:                        10.46.0.9
IPs:                       <none>
Controlled By:             StatefulSet/user-pg96
Containers:
  postgres:
    Container ID:   docker://4b6b8af82c69b591e0480ff2c186ba5c748dd28962d79cb85c5d9d0f0da5f23d
    Image:          registry.opensource.zalan.do/acid/spilo-11:1.5-p9
    Image ID:       docker-pullable://registry.opensource.zalan.do/acid/spilo-11@sha256:dd39c9581a56e44f80ae55fc86b7a3eb9735e5c799e056fa6cc77b63e84df307
    Ports:          8008/TCP, 5432/TCP, 8080/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Thu, 10 Oct 2019 10:27:11 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     3
      memory:  1Gi
    Requests:
      cpu:     100m
      memory:  100Mi
    Environment:
      SCOPE:                      user-pg96
      PGROOT:                     /home/postgres/pgdata/pgroot
      POD_IP:                      (v1:status.podIP)
      POD_NAMESPACE:              db (v1:metadata.namespace)
      PGUSER_SUPERUSER:           postgres
      KUBERNETES_SCOPE_LABEL:     version
      KUBERNETES_ROLE_LABEL:      spilo-role
      KUBERNETES_LABELS:          application=spilo
      PGPASSWORD_SUPERUSER:       <set to the key 'password' in secret 'postgres.user-pg96.credentials'>  Optional: false
      PGUSER_STANDBY:             standby
      PGPASSWORD_STANDBY:         <set to the key 'password' in secret 'standby.user-pg96.credentials'>  Optional: false
      PAM_OAUTH2:                 https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees
      HUMAN_ROLE:                 zalandos
      SPILO_CONFIGURATION:        {"postgresql":{"bin_dir":"/usr/lib/postgresql/9.6/bin","pg_hba":["local   all             all                                   trust","hostssl all             +zalandos    127.0.0.1/32       pam","host    all             all                127.0.0.1/32       md5","hostssl all             +zalandos    ::1/128            pam","host    all             all                ::1/128            md5","hostssl replication     standby all                md5","hostnossl all           all                all                md5","hostssl all             +zalandos    all                pam","hostssl all             all                all                md5"]},"bootstrap":{"initdb":[{"auth-host":"md5"},{"auth-local":"trust"}],"users":{"zalandos":{"password":"","options":["CREATEDB","NOLOGIN"]}},"dcs":{}}}
      DCS_ENABLE_KUBERNETES_API:  true
    Mounts:
      /dev/shm from dshm (rw)
      /home/postgres/pgdata from pgdata (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from zalando-postgres-operator-token-n26xw (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  pgdata:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pgdata-user-pg96-0
    ReadOnly:   false
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  zalando-postgres-operator-token-n26xw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  zalando-postgres-operator-token-n26xw
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  49m   default-scheduler  Successfully assigned db/user-pg96-0 to node1
  Normal  Pulling    49m   kubelet, node1     Pulling image "registry.opensource.zalan.do/acid/spilo-11:1.5-p9"
  Normal  Pulled     46m   kubelet, node1     Successfully pulled image "registry.opensource.zalan.do/acid/spilo-11:1.5-p9"
  Normal  Created    46m   kubelet, node1     Created container postgres
  Normal  Started    46m   kubelet, node1     Started container postgres

operator-log:

2019/10/10 08:47:14 Fully qualified configmap name: db/postgres-operator
2019/10/10 08:47:14 Spilo operator v1.2.0
time="2019-10-10T08:47:16Z" level=warning msg="in the operator config map, the pod service account name zalando-postgres-operator does not match the name operator given in the account definition; using the former for consistency" pkg=controller
time="2019-10-10T08:47:16Z" level=info msg="Parse role bindings" pkg=controller
time="2019-10-10T08:47:16Z" level=info msg="successfully parsed" pkg=controller
time="2019-10-10T08:47:16Z" level=info msg="Listening to all namespaces" pkg=controller
time="2019-10-10T08:47:16Z" level=info msg="customResourceDefinition \"postgresqls.acid.zalan.do\" is already registered and will only be updated" pkg=controller
time="2019-10-10T08:47:20Z" level=warning msg="in the operator config map, the pod service account name zalando-postgres-operator does not match the name operator given in the account definition; using the former for consistency" pkg=controller
time="2019-10-10T08:47:20Z" level=info msg="config: {\n\t\"ReadyWaitInterval\": 3000000000,\n\t\"ReadyWaitTimeout\": 30000000000,\n\t\"ResyncPeriod\": 300000000000,\n\t\"RepairPeriod\": 300000000000,\n\t\"ResourceCheckInterval\": 3000000000,\n\t\"ResourceCheckTimeout\": 600000000000,\n\t\"PodLabelWaitTimeout\": 600000000000,\n\t\"PodDeletionWaitTimeout\": 600000000000,\n\t\"SpiloFSGroup\": null,\n\t\"PodPriorityClassName\": \"\",\n\t\"ClusterDomain\": \"cluster.local\",\n\t\"SpiloPrivileged\": false,\n\t\"ClusterLabels\": {\n\t\t\"application\": \"spilo\"\n\t},\n\t\"InheritedLabels\": null,\n\t\"ClusterNameLabel\": \"version\",\n\t\"PodRoleLabel\": \"spilo-role\",\n\t\"PodToleration\": null,\n\t\"DefaultCPURequest\": \"100m\",\n\t\"DefaultMemoryRequest\": \"100Mi\",\n\t\"DefaultCPULimit\": \"3\",\n\t\"DefaultMemoryLimit\": \"1Gi\",\n\t\"PodEnvironmentConfigMap\": \"\",\n\t\"NodeReadinessLabel\": null,\n\t\"MaxInstances\": -1,\n\t\"MinInstances\": -1,\n\t\"ShmVolume\": true,\n\t\"SecretNameTemplate\": \"{username}.{cluster}.credentials\",\n\t\"PamRoleName\": \"zalandos\",\n\t\"PamConfiguration\": \"https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees\",\n\t\"TeamsAPIUrl\": \"https://teams.example.com/api/\",\n\t\"OAuthTokenSecretName\": \"db/postgresql-operator\",\n\t\"InfrastructureRolesSecretName\": \"/\",\n\t\"SuperUsername\": \"postgres\",\n\t\"ReplicationUsername\": \"standby\",\n\t\"ScalyrAPIKey\": \"\",\n\t\"ScalyrImage\": \"\",\n\t\"ScalyrServerURL\": \"https://upload.eu.scalyr.com\",\n\t\"ScalyrCPURequest\": \"100m\",\n\t\"ScalyrMemoryRequest\": \"50Mi\",\n\t\"ScalyrCPULimit\": \"1\",\n\t\"ScalyrMemoryLimit\": \"1Gi\",\n\t\"LogicalBackupSchedule\": \"30 00 * * *\",\n\t\"LogicalBackupDockerImage\": \"registry.opensource.zalan.do/acid/logical-backup\",\n\t\"LogicalBackupS3Bucket\": \"\",\n\t\"WatchedNamespace\": \"\",\n\t\"EtcdHost\": \"\",\n\t\"DockerImage\": \"registry.opensource.zalan.do/acid/spilo-11:1.5-p9\",\n\t\"Sidecars\": null,\n\t\"PodServiceAccountName\": \"zalando-postgres-operator\",\n\t\"PodServiceAccountDefinition\": \"\\n\\t\\t{ \\\"apiVersion\\\": \\\"v1\\\",\\n\\t\\t  \\\"kind\\\": \\\"ServiceAccount\\\",\\n\\t\\t  \\\"metadata\\\": {\\n\\t\\t\\t\\t \\\"name\\\": \\\"operator\\\"\\n\\t\\t   }\\n\\t\\t}\",\n\t\"PodServiceAccountRoleBindingDefinition\": \"\\n\\t\\t{\\n\\t\\t\\t\\\"apiVersion\\\": \\\"rbac.authorization.k8s.io/v1beta1\\\",\\n\\t\\t\\t\\\"kind\\\": \\\"RoleBinding\\\",\\n\\t\\t\\t\\\"metadata\\\": {\\n\\t\\t\\t\\t   \\\"name\\\": \\\"zalando-postgres-operator\\\"\\n\\t\\t\\t},\\n\\t\\t\\t\\\"roleRef\\\": {\\n\\t\\t\\t\\t\\\"apiGroup\\\": \\\"rbac.authorization.k8s.io\\\",\\n\\t\\t\\t\\t\\\"kind\\\": \\\"ClusterRole\\\",\\n\\t\\t\\t\\t\\\"name\\\": \\\"zalando-postgres-operator\\\"\\n\\t\\t\\t},\\n\\t\\t\\t\\\"subjects\\\": [\\n\\t\\t\\t\\t{\\n\\t\\t\\t\\t\\t\\\"kind\\\": \\\"ServiceAccount\\\",\\n\\t\\t\\t\\t\\t\\\"name\\\": \\\"zalando-postgres-operator\\\"\\n\\t\\t\\t\\t}\\n\\t\\t\\t]\\n\\t\\t}\",\n\t\"MasterPodMoveTimeout\": 1200000000000,\n\t\"DbHostedZone\": \"db.example.com\",\n\t\"AWSRegion\": \"eu-central-1\",\n\t\"WALES3Bucket\": \"\",\n\t\"LogS3Bucket\": \"\",\n\t\"KubeIAMRole\": \"\",\n\t\"AdditionalSecretMount\": \"\",\n\t\"AdditionalSecretMountPath\": \"/meta/credentials\",\n\t\"DebugLogging\": true,\n\t\"EnableDBAccess\": true,\n\t\"EnableTeamsAPI\": false,\n\t\"EnableTeamSuperuser\": false,\n\t\"TeamAdminRole\": \"admin\",\n\t\"EnableAdminRoleForUsers\": true,\n\t\"EnableMasterLoadBalancer\": true,\n\t\"EnableReplicaLoadBalancer\": false,\n\t\"CustomServiceAnnotations\": null,\n\t\"EnablePodAntiAffinity\": false,\n\t\"PodAntiAffinityTopologyKey\": \"kubernetes.io/hostname\",\n\t\"EnableLoadBalancer\": null,\n\t\"MasterDNSNameFormat\": \"{cluster}.{team}.staging.{hostedzone}\",\n\t\"ReplicaDNSNameFormat\": \"{cluster}-repl.{team}.staging.{hostedzone}\",\n\t\"PDBNameFormat\": \"postgres-{cluster}-pdb\",\n\t\"EnablePodDisruptionBudget\": true,\n\t\"Workers\": 4,\n\t\"APIPort\": 8080,\n\t\"RingLogLines\": 100,\n\t\"ClusterHistoryEntries\": 1000,\n\t\"TeamAPIRoleConfiguration\": {\n\t\t\"log_statement\": \"all\"\n\t},\n\t\"PodTerminateGracePeriod\": 300000000000,\n\t\"PodManagementPolicy\": \"ordered_ready\",\n\t\"ProtectedRoles\": [\n\t\t\"admin\"\n\t],\n\t\"PostgresSuperuserTeams\": null,\n\t\"SetMemoryRequestToLimit\": false\n}" pkg=controller
time="2019-10-10T08:47:20Z" level=debug msg="acquiring initial list of clusters" pkg=controller
time="2019-10-10T08:47:20Z" level=debug msg="added new cluster: \"db/user-pg96\"" pkg=controller
time="2019-10-10T08:47:20Z" level=info msg="\"SYNC\" event has been queued" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:47:20Z" level=info msg="there are 1 clusters running" pkg=controller
time="2019-10-10T08:47:20Z" level=info msg="started working in background" pkg=controller
time="2019-10-10T08:47:20Z" level=info msg="listening on :8080" pkg=apiserver
time="2019-10-10T08:47:20Z" level=info msg="\"ADD\" event has been queued" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:47:20Z" level=debug msg="new node has been added: \"/node1\" ()" pkg=controller
time="2019-10-10T08:47:20Z" level=debug msg="new node has been added: \"/node2\" ()" pkg=controller
time="2019-10-10T08:47:20Z" level=debug msg="new node has been added: \"/node3\" ()" pkg=controller
time="2019-10-10T08:47:20Z" level=info msg="syncing of the cluster started" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:47:20Z" level=debug msg="team API is disabled, returning empty list of members for team \"user\"" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=debug msg="syncing secrets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=debug msg="secret \"db/standby.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=debug msg="secret \"db/foo-user.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=debug msg="secret \"db/kraken.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=debug msg="secret \"db/taras.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=debug msg="secret \"db/postgres.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=debug msg="syncing services" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=debug msg="syncing master service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:20Z" level=info msg="could not find the cluster's master endpoint" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:21Z" level=info msg="created missing master endpoint \"db/user-pg96\"" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:21Z" level=info msg="could not find the cluster's master service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=info msg="created missing master service \"db/user-pg96\"" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="syncing replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="No load balancer created for the replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="syncing persistent volumes" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="skipping persistent volume \"pgdata-user-pg96-1\" corresponding to a non-running pods" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="syncing statefulsets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="Generating Spilo container, environment variables: [{SCOPE user-pg96 nil} {PGROOT /home/postgres/pgdata/pgroot nil} {POD_IP  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {POD_NAMESPACE  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {PGUSER_SUPERUSER postgres nil} {KUBERNETES_SCOPE_LABEL version nil} {KUBERNETES_ROLE_LABEL spilo-role nil} {KUBERNETES_LABELS application=spilo nil} {PGPASSWORD_SUPERUSER  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:postgres.user-pg96.credentials,},Key:password,Optional:nil,},}} {PGUSER_STANDBY standby nil} {PGPASSWORD_STANDBY  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:standby.user-pg96.credentials,},Key:password,Optional:nil,},}} {PAM_OAUTH2 https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees nil} {HUMAN_ROLE zalandos nil} {SPILO_CONFIGURATION {\"postgresql\":{\"bin_dir\":\"/usr/lib/postgresql/9.6/bin\",\"pg_hba\":[\"local   all             all                                   trust\",\"hostssl all             +zalandos    127.0.0.1/32       pam\",\"host    all             all                127.0.0.1/32       md5\",\"hostssl all             +zalandos    ::1/128            pam\",\"host    all             all                ::1/128            md5\",\"hostssl replication     standby all                md5\",\"hostnossl all           all                all                md5\",\"hostssl all             +zalandos    all                pam\",\"hostssl all             all                all                md5\"]},\"bootstrap\":{\"initdb\":[{\"auth-host\":\"md5\"},{\"auth-local\":\"trust\"}],\"users\":{\"zalandos\":{\"password\":\"\",\"options\":[\"CREATEDB\",\"NOLOGIN\"]}},\"dcs\":{}}} nil} {DCS_ENABLE_KUBERNETES_API true nil}]" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="statefulset's rolling update annotation has been set to false" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="syncing pod disruption budgets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:22Z" level=debug msg="syncing roles" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:23Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup user-pg96.db.svc.cluster.local on 169.254.20.10:53: no such host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:40Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:47:55Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:48:10Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:48:25Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:48:40Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:48:55Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:49:10Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:49:10Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:49:10Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:49:10Z" level=debug msg="cluster already exists" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:52:20Z" level=info msg="\"SYNC\" event has been queued" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:52:20Z" level=info msg="there are 1 clusters running" pkg=controller
time="2019-10-10T08:52:20Z" level=info msg="syncing of the cluster started" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:52:20Z" level=debug msg="team API is disabled, returning empty list of members for team \"user\"" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="syncing secrets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="secret \"db/postgres.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="secret \"db/standby.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="secret \"db/kraken.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="secret \"db/taras.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="secret \"db/foo-user.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="syncing services" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="syncing master service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="syncing replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="No load balancer created for the replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:20Z" level=debug msg="syncing persistent volumes" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:21Z" level=debug msg="skipping persistent volume \"pgdata-user-pg96-1\" corresponding to a non-running pods" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:21Z" level=debug msg="syncing statefulsets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:21Z" level=debug msg="cached StatefulSet value exists, rollingUpdate flag is true" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:21Z" level=debug msg="Generating Spilo container, environment variables: [{SCOPE user-pg96 nil} {PGROOT /home/postgres/pgdata/pgroot nil} {POD_IP  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {POD_NAMESPACE  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {PGUSER_SUPERUSER postgres nil} {KUBERNETES_SCOPE_LABEL version nil} {KUBERNETES_ROLE_LABEL spilo-role nil} {KUBERNETES_LABELS application=spilo nil} {PGPASSWORD_SUPERUSER  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:postgres.user-pg96.credentials,},Key:password,Optional:nil,},}} {PGUSER_STANDBY standby nil} {PGPASSWORD_STANDBY  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:standby.user-pg96.credentials,},Key:password,Optional:nil,},}} {PAM_OAUTH2 https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees nil} {HUMAN_ROLE zalandos nil} {SPILO_CONFIGURATION {\"postgresql\":{\"bin_dir\":\"/usr/lib/postgresql/9.6/bin\",\"pg_hba\":[\"local   all             all                                   trust\",\"hostssl all             +zalandos    127.0.0.1/32       pam\",\"host    all             all                127.0.0.1/32       md5\",\"hostssl all             +zalandos    ::1/128            pam\",\"host    all             all                ::1/128            md5\",\"hostssl replication     standby all                md5\",\"hostnossl all           all                all                md5\",\"hostssl all             +zalandos    all                pam\",\"hostssl all             all                all                md5\"]},\"bootstrap\":{\"initdb\":[{\"auth-host\":\"md5\"},{\"auth-local\":\"trust\"}],\"users\":{\"zalandos\":{\"password\":\"\",\"options\":[\"CREATEDB\",\"NOLOGIN\"]}},\"dcs\":{}}} nil} {DCS_ENABLE_KUBERNETES_API true nil}]" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:21Z" level=debug msg="statefulset's rolling update annotation has been set to false" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:21Z" level=debug msg="syncing pod disruption budgets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:21Z" level=debug msg="syncing roles" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:52:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:53:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:53:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:53:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:53:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:54:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:54:09Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:54:09Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:57:20Z" level=info msg="\"SYNC\" event has been queued" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:57:20Z" level=info msg="there are 1 clusters running" pkg=controller
time="2019-10-10T08:57:20Z" level=info msg="syncing of the cluster started" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T08:57:20Z" level=debug msg="team API is disabled, returning empty list of members for team \"user\"" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="syncing secrets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="secret \"db/foo-user.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="secret \"db/kraken.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="secret \"db/taras.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="secret \"db/postgres.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="secret \"db/standby.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="syncing services" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="syncing master service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="syncing replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="No load balancer created for the replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:20Z" level=debug msg="syncing persistent volumes" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:21Z" level=debug msg="skipping persistent volume \"pgdata-user-pg96-1\" corresponding to a non-running pods" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:21Z" level=debug msg="syncing statefulsets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:21Z" level=debug msg="cached StatefulSet value exists, rollingUpdate flag is true" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:21Z" level=debug msg="Generating Spilo container, environment variables: [{SCOPE user-pg96 nil} {PGROOT /home/postgres/pgdata/pgroot nil} {POD_IP  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {POD_NAMESPACE  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {PGUSER_SUPERUSER postgres nil} {KUBERNETES_SCOPE_LABEL version nil} {KUBERNETES_ROLE_LABEL spilo-role nil} {KUBERNETES_LABELS application=spilo nil} {PGPASSWORD_SUPERUSER  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:postgres.user-pg96.credentials,},Key:password,Optional:nil,},}} {PGUSER_STANDBY standby nil} {PGPASSWORD_STANDBY  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:standby.user-pg96.credentials,},Key:password,Optional:nil,},}} {PAM_OAUTH2 https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees nil} {HUMAN_ROLE zalandos nil} {SPILO_CONFIGURATION {\"postgresql\":{\"bin_dir\":\"/usr/lib/postgresql/9.6/bin\",\"pg_hba\":[\"local   all             all                                   trust\",\"hostssl all             +zalandos    127.0.0.1/32       pam\",\"host    all             all                127.0.0.1/32       md5\",\"hostssl all             +zalandos    ::1/128            pam\",\"host    all             all                ::1/128            md5\",\"hostssl replication     standby all                md5\",\"hostnossl all           all                all                md5\",\"hostssl all             +zalandos    all                pam\",\"hostssl all             all                all                md5\"]},\"bootstrap\":{\"initdb\":[{\"auth-host\":\"md5\"},{\"auth-local\":\"trust\"}],\"users\":{\"zalandos\":{\"password\":\"\",\"options\":[\"CREATEDB\",\"NOLOGIN\"]}},\"dcs\":{}}} nil} {DCS_ENABLE_KUBERNETES_API true nil}]" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:21Z" level=debug msg="statefulset's rolling update annotation has been set to false" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:21Z" level=debug msg="syncing pod disruption budgets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:21Z" level=debug msg="syncing roles" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:57:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:58:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:58:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:58:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:58:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:59:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:59:09Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T08:59:09Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:02:20Z" level=info msg="\"SYNC\" event has been queued" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:02:20Z" level=info msg="there are 1 clusters running" pkg=controller
time="2019-10-10T09:02:20Z" level=info msg="syncing of the cluster started" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:02:20Z" level=debug msg="team API is disabled, returning empty list of members for team \"user\"" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="syncing secrets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="secret \"db/taras.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="secret \"db/postgres.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="secret \"db/standby.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="secret \"db/foo-user.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="secret \"db/kraken.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="syncing services" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="syncing master service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="syncing replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="No load balancer created for the replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:20Z" level=debug msg="syncing persistent volumes" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:21Z" level=debug msg="skipping persistent volume \"pgdata-user-pg96-1\" corresponding to a non-running pods" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:21Z" level=debug msg="syncing statefulsets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:21Z" level=debug msg="cached StatefulSet value exists, rollingUpdate flag is true" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:21Z" level=debug msg="Generating Spilo container, environment variables: [{SCOPE user-pg96 nil} {PGROOT /home/postgres/pgdata/pgroot nil} {POD_IP  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {POD_NAMESPACE  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {PGUSER_SUPERUSER postgres nil} {KUBERNETES_SCOPE_LABEL version nil} {KUBERNETES_ROLE_LABEL spilo-role nil} {KUBERNETES_LABELS application=spilo nil} {PGPASSWORD_SUPERUSER  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:postgres.user-pg96.credentials,},Key:password,Optional:nil,},}} {PGUSER_STANDBY standby nil} {PGPASSWORD_STANDBY  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:standby.user-pg96.credentials,},Key:password,Optional:nil,},}} {PAM_OAUTH2 https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees nil} {HUMAN_ROLE zalandos nil} {SPILO_CONFIGURATION {\"postgresql\":{\"bin_dir\":\"/usr/lib/postgresql/9.6/bin\",\"pg_hba\":[\"local   all             all                                   trust\",\"hostssl all             +zalandos    127.0.0.1/32       pam\",\"host    all             all                127.0.0.1/32       md5\",\"hostssl all             +zalandos    ::1/128            pam\",\"host    all             all                ::1/128            md5\",\"hostssl replication     standby all                md5\",\"hostnossl all           all                all                md5\",\"hostssl all             +zalandos    all                pam\",\"hostssl all             all                all                md5\"]},\"bootstrap\":{\"initdb\":[{\"auth-host\":\"md5\"},{\"auth-local\":\"trust\"}],\"users\":{\"zalandos\":{\"password\":\"\",\"options\":[\"CREATEDB\",\"NOLOGIN\"]}},\"dcs\":{}}} nil} {DCS_ENABLE_KUBERNETES_API true nil}]" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:21Z" level=debug msg="statefulset's rolling update annotation has been set to false" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:21Z" level=debug msg="syncing pod disruption budgets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:21Z" level=debug msg="syncing roles" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:02:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:03:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:03:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:03:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:03:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:04:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:04:09Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:04:09Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:07:20Z" level=info msg="\"SYNC\" event has been queued" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:07:20Z" level=info msg="there are 1 clusters running" pkg=controller
time="2019-10-10T09:07:20Z" level=info msg="syncing of the cluster started" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:07:20Z" level=debug msg="team API is disabled, returning empty list of members for team \"user\"" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="syncing secrets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="secret \"db/foo-user.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="secret \"db/kraken.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="secret \"db/taras.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="secret \"db/postgres.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="secret \"db/standby.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="syncing services" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="syncing master service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="syncing replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="No load balancer created for the replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:20Z" level=debug msg="syncing persistent volumes" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:21Z" level=debug msg="skipping persistent volume \"pgdata-user-pg96-1\" corresponding to a non-running pods" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:21Z" level=debug msg="syncing statefulsets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:21Z" level=debug msg="cached StatefulSet value exists, rollingUpdate flag is true" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:21Z" level=debug msg="Generating Spilo container, environment variables: [{SCOPE user-pg96 nil} {PGROOT /home/postgres/pgdata/pgroot nil} {POD_IP  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {POD_NAMESPACE  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {PGUSER_SUPERUSER postgres nil} {KUBERNETES_SCOPE_LABEL version nil} {KUBERNETES_ROLE_LABEL spilo-role nil} {KUBERNETES_LABELS application=spilo nil} {PGPASSWORD_SUPERUSER  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:postgres.user-pg96.credentials,},Key:password,Optional:nil,},}} {PGUSER_STANDBY standby nil} {PGPASSWORD_STANDBY  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:standby.user-pg96.credentials,},Key:password,Optional:nil,},}} {PAM_OAUTH2 https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees nil} {HUMAN_ROLE zalandos nil} {SPILO_CONFIGURATION {\"postgresql\":{\"bin_dir\":\"/usr/lib/postgresql/9.6/bin\",\"pg_hba\":[\"local   all             all                                   trust\",\"hostssl all             +zalandos    127.0.0.1/32       pam\",\"host    all             all                127.0.0.1/32       md5\",\"hostssl all             +zalandos    ::1/128            pam\",\"host    all             all                ::1/128            md5\",\"hostssl replication     standby all                md5\",\"hostnossl all           all                all                md5\",\"hostssl all             +zalandos    all                pam\",\"hostssl all             all                all                md5\"]},\"bootstrap\":{\"initdb\":[{\"auth-host\":\"md5\"},{\"auth-local\":\"trust\"}],\"users\":{\"zalandos\":{\"password\":\"\",\"options\":[\"CREATEDB\",\"NOLOGIN\"]}},\"dcs\":{}}} nil} {DCS_ENABLE_KUBERNETES_API true nil}]" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:21Z" level=debug msg="statefulset's rolling update annotation has been set to false" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:21Z" level=debug msg="syncing pod disruption budgets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:21Z" level=debug msg="syncing roles" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:07:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:08:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:08:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:08:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:08:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:09:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:09:09Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:09:09Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:12:20Z" level=info msg="\"SYNC\" event has been queued" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:12:20Z" level=info msg="there are 1 clusters running" pkg=controller
time="2019-10-10T09:12:20Z" level=info msg="syncing of the cluster started" cluster-name=db/user-pg96 pkg=controller worker=0
time="2019-10-10T09:12:20Z" level=debug msg="team API is disabled, returning empty list of members for team \"user\"" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="syncing secrets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="secret \"db/foo-user.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="secret \"db/kraken.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="secret \"db/taras.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="secret \"db/postgres.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="secret \"db/standby.user-pg96.credentials\" already exists, fetching its password" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="syncing services" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="syncing master service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="syncing replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="No load balancer created for the replica service" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:20Z" level=debug msg="syncing persistent volumes" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:21Z" level=debug msg="skipping persistent volume \"pgdata-user-pg96-1\" corresponding to a non-running pods" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:21Z" level=debug msg="syncing statefulsets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:21Z" level=debug msg="cached StatefulSet value exists, rollingUpdate flag is true" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:21Z" level=debug msg="Generating Spilo container, environment variables: [{SCOPE user-pg96 nil} {PGROOT /home/postgres/pgdata/pgroot nil} {POD_IP  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {POD_NAMESPACE  &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {PGUSER_SUPERUSER postgres nil} {KUBERNETES_SCOPE_LABEL version nil} {KUBERNETES_ROLE_LABEL spilo-role nil} {KUBERNETES_LABELS application=spilo nil} {PGPASSWORD_SUPERUSER  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:postgres.user-pg96.credentials,},Key:password,Optional:nil,},}} {PGUSER_STANDBY standby nil} {PGPASSWORD_STANDBY  &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:standby.user-pg96.credentials,},Key:password,Optional:nil,},}} {PAM_OAUTH2 https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees nil} {HUMAN_ROLE zalandos nil} {SPILO_CONFIGURATION {\"postgresql\":{\"bin_dir\":\"/usr/lib/postgresql/9.6/bin\",\"pg_hba\":[\"local   all             all                                   trust\",\"hostssl all             +zalandos    127.0.0.1/32       pam\",\"host    all             all                127.0.0.1/32       md5\",\"hostssl all             +zalandos    ::1/128            pam\",\"host    all             all                ::1/128            md5\",\"hostssl replication     standby all                md5\",\"hostnossl all           all                all                md5\",\"hostssl all             +zalandos    all                pam\",\"hostssl all             all                all                md5\"]},\"bootstrap\":{\"initdb\":[{\"auth-host\":\"md5\"},{\"auth-local\":\"trust\"}],\"users\":{\"zalandos\":{\"password\":\"\",\"options\":[\"CREATEDB\",\"NOLOGIN\"]}},\"dcs\":{}}} nil} {DCS_ENABLE_KUBERNETES_API true nil}]" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:21Z" level=debug msg="statefulset's rolling update annotation has been set to false" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:21Z" level=debug msg="syncing pod disruption budgets" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:21Z" level=debug msg="syncing roles" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:12:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:13:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:13:24Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:13:39Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:13:54Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:14:09Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.104.53.216:5432: connect: no route to host" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:14:09Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=cluster
time="2019-10-10T09:14:09Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=db/user-pg96 pkg=controller worker=0
FxKu commented 4 years ago

From the logs you can see that the host user-pg96.db.svc.cluster.local can not be resolved. Maybe your cluster domain differs from cluster.local like in this issue. Note, that this string can be configured, e.g. here.