Cluster doesn't start - Githubissues

alterEgo123 commented 4 years ago

Working on latest version of postgres-operator v1.5.0, I'm getting these issues.

could not create cluster: pod labels error: still failing after 200 retries could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries

Postgres pods are running but postgresql service is not, Kubectl get pg returns CreateFailed

This is my manifest file

kind: "postgresql"
apiVersion: "acid.zalan.do/v1"

metadata:
  name: "acid-batman"
  namespace: "default"
  labels:
    team: acid

spec:
  teamId: "acid"
  postgresql:
    version: "12"
  patroni:
    initdb:
      encoding: "UTF8"
      locale: "en_US.UTF-8"
      data-checksums: "true"
  numberOfInstances: 3
  enableMasterLoadBalancer: true
  volume:
    size: "1Gi"
    storageClass: "generic"
  users:
    batman: 
      - superuser
      - createdb
  databases:
    batman_db: batman
  allowedSourceRanges:

  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 500m
      memory: 500Mi

Kubectl logs returns:

2020-06-19 09:14:23,901 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2020-06-19 09:14:23,903 - bootstrapping - INFO - No meta-data available for this provider
2020-06-19 09:14:23,903 - bootstrapping - INFO - Looks like your running local
2020-06-19 09:14:24,005 - bootstrapping - INFO - Configuring standby-cluster
2020-06-19 09:14:24,006 - bootstrapping - INFO - Configuring pgqd
2020-06-19 09:14:24,007 - bootstrapping - INFO - Configuring patroni
2020-06-19 09:14:24,078 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2020-06-19 09:14:24,080 - bootstrapping - INFO - Configuring wal-e
2020-06-19 09:14:24,081 - bootstrapping - INFO - Configuring certificate
2020-06-19 09:14:24,081 - bootstrapping - INFO - Generating ssl certificate
2020-06-19 09:14:24,409 - bootstrapping - INFO - Configuring crontab
2020-06-19 09:14:24,411 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2020-06-19 09:14:24,412 - bootstrapping - INFO - Configuring pam-oauth2
2020-06-19 09:14:24,413 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2020-06-19 09:14:24,413 - bootstrapping - INFO - Configuring bootstrap
2020-06-19 09:14:24,413 - bootstrapping - INFO - Configuring log
2020-06-19 09:14:24,413 - bootstrapping - INFO - Configuring pgbouncer
2020-06-19 09:14:24,414 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2020-06-19 09:14:27,190 INFO: No PostgreSQL configuration items changed, nothing to reload.
2020-06-19 09:14:27,304 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:14:37,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:14:47,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:14:57,202 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:15:07,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:15:17,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:15:27,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:15:37,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:15:47,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:15:57,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:16:07,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:16:17,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:16:27,199 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize
2020-06-19 09:16:37,198 INFO: data dir for the cluster is not empty, but system ID is invalid; consider doing reinitialize

FxKu commented 4 years ago

Maybe it's because of: storageClass: "generic"? What do operator logs say? Or describe on K8s resources?

mabushey commented 4 years ago

I'm envious you're able to get log entries. When I run kubectl -n postgres get pg I get

NAME     TEAM    VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE   STATUS
cubejs   devops  12        1      5Gi                                     73m

The operator has no log entries since the initial creation, and there are no pods to get log files from.

deepd commented 3 years ago

@alterEgo123 may be you have same issue as mentioned in https://github.com/zalando/patroni/issues/570#issuecomment-563103932

I am seeing same issue as you. I also see same situation as mentioned in above link. I'm not sure how the pg_control file went missing in the 2 replica instances I have running.

zalando / postgres-operator

Cluster doesn't start #1027