zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.2k stars 963 forks source link

could not connect to PostgreSQL database: dial tcp 127.0.0.1:5432 #702

Closed qurname2 closed 4 years ago

qurname2 commented 4 years ago

Hi guys! I try to create postgresql cluster, but in postgres-operatot logs I see this: could not connect to PostgreSQL database: dial tcp 127.0.0.1:5432: connect: connection refused" and therefore users and db's from my config weren't created. But pods with postgresql created and patronictl said me, that all is good - one of my postgresql is a leader and psql command is work..

I used this config for postgres-operator:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-operator
  namespace: postgres-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: postgres-operator
  template:
    metadata:
      labels:
        name: postgres-operator
    spec:
      serviceAccountName: zalando-postgres-operator
      containers:
      - name: postgres-operator
        image: registry.opensource.zalan.do/acid/postgres-operator:v1.2.0
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 500m
            memory: 250Mi
          limits:
            cpu: 2000m
            memory: 500Mi
        securityContext:
          runAsUser: 1000
          runAsNonRoot: true
          readOnlyRootFilesystem: true
        env:
        # provided additional ENV vars can overwrite individual config map entries
        - name: CONFIG_MAP_NAME
          value: "postgres-operator"
        - name: WATCHED_NAMESPACE
          value: '*'
        # In order to use the CRD OperatorConfiguration instead, uncomment these lines and comment out the two lines above
        - name: POSTGRES_OPERATOR_CONFIGURATION_OBJECT
          value: postgresql-operator-default-configuration

This config for postgresql-operator-default-configuration:

apiVersion: "acid.zalan.do/v1"
kind: OperatorConfiguration
metadata:
  name: postgresql-operator-default-configuration
  namespace: postgres-operator
configuration:
  etcd_host: ""
  docker_image: registry.opensource.zalan.do/acid/spilo-11:1.6-p1
  # enable_shm_volume: true
  max_instances: -1
  min_instances: -1
  resync_period: 30m
  repair_period: 5m
  # set_memory_request_to_limit: false
  # sidecar_docker_images:
  #   example: "exampleimage:exampletag"
  workers: 4
  users:
    replication_username: standby
    super_username: postgres
  kubernetes:
    cluster_domain: cluster.local
    cluster_labels:
        application: spilo
    cluster_name_label: "cluster-name"
    enable_pod_antiaffinity: false
    enable_pod_disruption_budget: true
    # infrastructure_roles_secret_name: ""
    # inherited_labels:
    # - application
    # - environment
    # node_readiness_label: ""
    oauth_token_secret_name: postgresql-operator
    pdb_name_format: "postgres-{cluster}-pdb"
    pod_antiaffinity_topology_key: "kubernetes.io/hostname"
    # pod_environment_configmap: ""
    pod_management_policy: "ordered_ready"
    pod_role_label: spilo-role
    pod_service_account_name: zalando-postgres-operator
    pod_terminate_grace_period: 5m
    secret_name_template: "{username}.{cluster}.credentials.{tprkind}.{tprgroup}"
    # spilo_fsgroup: 103
    spilo_privileged: false
    # toleration: {}
    # watched_namespace:""
  postgres_pod_resources:
    default_cpu_limit: "3"
    default_cpu_request: 100m
    default_memory_limit: 1Gi
    default_memory_request: 100Mi
  timeouts:
    pod_label_wait_timeout: 10m
    pod_deletion_wait_timeout: 10m
    ready_wait_interval: 4s
    ready_wait_timeout: 30s
    resource_check_interval: 3s
    resource_check_timeout: 10m
  load_balancer:
    # db_hosted_zone: ""
    enable_master_load_balancer: false
    enable_replica_load_balancer: false
    # custom_service_annotations:
    #   zalando-postgres-operator-rolling-update-required: "True"
    master_dns_name_format: "{cluster}.{team}.{hostedzone}"
    replica_dns_name_format: "{cluster}-repl.{team}.{hostedzone}"
  aws_or_gcp:
    # additional_secret_mount: "some-secret-name"
    # additional_secret_mount_path: "/some/dir"
    aws_region: eu-central-1
    # kube_iam_role: ""
    # log_s3_bucket: ""
    # wal_s3_bucket: ""
  logical_backup:
    logical_backup_schedule: "30 00 * * *"
    logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
    logical_backup_s3_bucket: "my-bucket-url"
  debug:
    debug_logging: true
    enable_database_access: true
  teams_api:
    # enable_admin_role_for_users: true
    enable_team_superuser: false
    enable_teams_api: false
    # pam_configuration: ""
    pam_role_name: zalandos
    # postgres_superuser_teams: "postgres_superusers"
    protected_role_names:
      - admin
    team_admin_role: admin
    team_api_role_configuration:
      log_statement: all
    # teams_api_url: ""
  logging_rest_api:
    api_port: 8008
    cluster_history_entries: 1000
    ring_log_lines: 100
  scalyr:
    # scalyr_api_key: ""
    scalyr_cpu_limit: "1"
    scalyr_cpu_request: 100m
    # scalyr_image: ""
    scalyr_memory_limit: 1Gi
    scalyr_memory_request: 50Mi
    # scalyr_server_url: ""

This configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-operator
  namespace: postgres-operator
data:
  # additional_secret_mount: "some-secret-name"
  # additional_secret_mount_path: "/some/dir"
  api_port: "8080"
  aws_region: eu-central-1
  cluster_domain: cluster.local
  cluster_history_entries: "1000"
  cluster_labels: application:spilo
  cluster_name_label: version
  # custom_service_annotations:
  #   "keyx:valuez,keya:valuea"
  db_hosted_zone: db.example.com
  debug_logging: "true"
  # default_cpu_limit: "3"
  # default_cpu_request: 100m
  # default_memory_limit: 1Gi
  # default_memory_request: 100Mi
  docker_image: registry.opensource.zalan.do/acid/spilo-11:1.6-p1
  # enable_admin_role_for_users: "true"
  enable_database_access: "true"
  enable_master_load_balancer: "false"
  # enable_pod_antiaffinity: "false"
  # enable_pod_disruption_budget: "true"
  enable_replica_load_balancer: "false"
  # enable_shm_volume: "true"
  # enable_team_superuser: "false"
  enable_teams_api: "false"
  # etcd_host: ""
  # infrastructure_roles_secret_name: postgresql-infrastructure-roles
  inherited_labels: service
  # kube_iam_role: ""
  # log_s3_bucket: ""
  # logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
  # logical_backup_s3_bucket: ""
  # logical_backup_schedule: "30 00 * * *"
  master_dns_name_format: '{cluster}.{team}.staging.{hostedzone}'
  # master_pod_move_timeout: 10m
  # max_instances: "-1"
  # min_instances: "-1"
  # node_readiness_label: ""
  # oauth_token_secret_name: postgresql-operator
  # pam_configuration: |
  #  https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees
  # pam_role_name: zalandos
  pdb_name_format: "postgres-{cluster}-pdb"
  # pod_antiaffinity_topology_key: "kubernetes.io/hostname"
  pod_deletion_wait_timeout: 10m
  # pod_environment_configmap: ""
  pod_label_wait_timeout: 10m
  pod_management_policy: "ordered_ready"
  pod_role_label: spilo-role
  pod_service_account_name: "zalando-postgres-operator"
  pod_terminate_grace_period: 5m
  # postgres_superuser_teams: "postgres_superusers"
  # protected_role_names: "admin"
  ready_wait_interval: 3s
  ready_wait_timeout: 30s
  repair_period: 5m
  replica_dns_name_format: '{cluster}-repl.{team}.staging.{hostedzone}'
  replication_username: standby
  resource_check_interval: 3s
  resource_check_timeout: 10m
  resync_period: 5m
  ring_log_lines: "100"
  secret_name_template: '{username}.{cluster}.credentials'
  # sidecar_docker_images: ""
  # set_memory_request_to_limit: "false"
  spilo_privileged: "false"
  super_username: postgres
  # team_admin_role: "admin"
  # team_api_role_configuration: "log_statement:all"
  # teams_api_url: http://fake-teams-api.default.svc.cluster.local
  # toleration: ""
  # wal_s3_bucket: ""
  watched_namespace: "*"  # listen to all namespaces
  workers: "4"

And this postgresql crd:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
  namespace: tests-postgresql
  labels:
    service: tests-postgresql
spec:
  teamId: "ACID"
  volume:
    size: 1Gi
  numberOfInstances: 2
  users:
    postgres:
    - superuser
    - createdb
    cachet:
    - createdb
    # foo_user: []  # role for application foo
  #databases: name->owner
  databases:
    deploy-tests: postgres
    other-db: cachet
  postgresql:
    version: "11"

Any ideas, what I'm doing wrong?

FxKu commented 4 years ago

It's fine to either use ConfigMap or OperatorConfiguration. Using both makes it harder to tell which setting is coming from where. In the cluster manifest, don't specify the postgres user as he's already there. Maybe this is causing the trouble. Please, check the logs of operator Pod and database Pods for more error messages.

qurname2 commented 4 years ago

I deleted using OperatorConfiguration, deleted defenition postgres user from postgresql manifest. Also I thought, that its problem with DNS - postgres-operator didn't correctly resolve DNS name of my postgresql service (acid-minimal-cluster), but no, after exec inside container with an operator I see a correct answer from nslookup.. And didn't see more error messages in operator and db Pods..

FxKu commented 4 years ago

Delete also the env variable POSTGRES_OPERATOR_CONFIGURATION_OBJECT in the deployment and start again from scratch. Are you testing on minikube?

qurname2 commented 4 years ago

@FxKu, sorry that I did not reply for a while. Yes, I deleted env POSTGRES_OPERATOR_CONFIGURATION_OBJECT, del postgres-operator and after this getting the same behavior. No, I am testing on bare-metal k8s-cluster - v1.14.3

FxKu commented 4 years ago

@qurname2 can you try the latest Postgres Operator version (not 1.2.0)? Was also thinking that your cluster domain is different to cluster.local, but you said that's not the case. So Patroni (db Pods) logs look fine? And there is not other warning or error in the operator logs?

qurname2 commented 4 years ago

After upgrading to latest tag:

2019/11/08 09:43:25 Fully qualified configmap name: postgres-operator/postgres-operator
2019/11/08 09:43:25 Spilo operator v1.2.0-23-g33e1d60-dirty
time="2019-11-08T09:43:25Z" level=warning msg="in the operator config map, the pod service account name zalando-postgres-operator does not match the name operator given in the account definition; using the former for consistency" pkg=controller
time="2019-11-08T09:43:25Z" level=info msg="Parse role bindings" pkg=controller
time="2019-11-08T09:43:25Z" level=info msg="successfully parsed" pkg=controller
time="2019-11-08T09:43:25Z" level=info msg="Listening to all namespaces" pkg=controller
time="2019-11-08T09:43:25Z" level=info msg="customResourceDefinition \"postgresqls.acid.zalan.do\" is already registered and will only be updated" pkg=controller
time="2019-11-08T09:43:29Z" level=warning msg="in the operator config map, the pod service account name zalando-postgres-operator does not match the name operator given in the account definition; using the former for consistency" pkg=controller
time="2019-11-08T09:43:29Z" level=info msg="config: {\n\t\"ReadyWaitInterval\": 3000000000,\n\t\"ReadyWaitTimeout\" .... ..... }
time="2019-11-08T09:43:29Z" level=debug msg="acquiring initial list of clusters" pkg=controller
time="2019-11-08T09:43:29Z" level=debug msg="added new cluster: \"my-ns/my-app-db\"" pkg=controller
time="2019-11-08T09:43:29Z" level=info msg="\"SYNC\" event has been queued" cluster-name=my-ns/my-app-db pkg=controller worker=0
time="2019-11-08T09:43:29Z" level=info msg="there are 1 clusters running" pkg=controller
time="2019-11-08T09:43:29Z" level=info msg="started working in background" pkg=controller
time="2019-11-08T09:43:29Z" level=info msg="listening on :8080" pkg=apiserver
time="2019-11-08T09:43:29Z" level=info msg="\"ADD\" event has been queued" cluster-name=my-ns/my-app-db pkg=controller worker=0
time="2019-11-08T09:43:29Z" level=info msg="syncing of the cluster started" cluster-name=my-ns/my-app-db pkg=controller worker=0
time="2019-11-08T09:43:29Z" level=warning msg="could not get oauth token to authenticate to team service API, returning empty list of team members: could not get credentials secret: secrets \"postgresql-operator\" not found" cluster-name=my-ns/my-app-db pkg=cluster
time="2019-11-08T09:43:29Z" level=debug msg="syncing secrets" cluster-name=my-ns/my-app-db pkg=cluster
time="2019-11-08T09:43:29Z" level=debug msg="new node has been added: \"/my-k8s-worker-node\" ()" pkg=controller
.... ....
time="2019-11-08T09:45:16Z" level=error msg="could not connect to PostgreSQL database: dial tcp 127.0.0.1:5432: connect: connection refused" cluster-name=my-ns/my-app-db pkg=cluster
time="2019-11-08T09:45:16Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=my-ns/my-app-db pkg=cluster
time="2019-11-08T09:45:16Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=my-ns/my-app-db pkg=controller worker=0
time="2019-11-08T09:45:16Z" level=debug msg="cluster already exists" cluster-name=my-ns/my-app-db pkg=controller worker=0

I also thought about cluster.local, but yes, that's not the case.

root@my-app-db-0:/home/postgres# cat /etc/resolv.conf
nameserver 10.233.0.3
search my-ns.svc.cluster.local svc.cluster.local cluster.local mycompany.org
options ndots:5
qurname2 commented 4 years ago

Also, after upgrading tag to latest I tried recreate Kind: postgresql and in db logs didn't see smth interesting..

decompressing spilo image...
2019-11-08 10:02:20,847 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2019-11-08 10:02:22,852 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2019-11-08 10:02:22,853 - bootstrapping - INFO - No meta-data available for this provider
2019-11-08 10:02:22,853 - bootstrapping - INFO - Looks like your running local
2019-11-08 10:02:22,869 - bootstrapping - WARNING - could not parse kubernetes labels as a JSON: Expecting value: line 1 column 1 (char 0), reverting to the default: {"application": "spilo"}
2019-11-08 10:02:22,883 - bootstrapping - INFO - Configuring standby-cluster
2019-11-08 10:02:22,883 - bootstrapping - INFO - Configuring pam-oauth2
2019-11-08 10:02:22,884 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2019-11-08 10:02:22,884 - bootstrapping - INFO - Configuring patronictl
2019-11-08 10:02:22,884 - bootstrapping - INFO - Configuring patroni
2019-11-08 10:02:22,893 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2019-11-08 10:02:22,893 - bootstrapping - INFO - Configuring bootstrap
2019-11-08 10:02:22,893 - bootstrapping - INFO - Configuring wal-e
2019-11-08 10:02:22,893 - bootstrapping - INFO - Configuring log
2019-11-08 10:02:22,893 - bootstrapping - INFO - Configuring certificate
2019-11-08 10:02:22,893 - bootstrapping - INFO - Generating ssl certificate
2019-11-08 10:02:22,944 - bootstrapping - INFO - Configuring crontab
2019-11-08 10:02:22,972 - bootstrapping - INFO - Configuring renice
2019-11-08 10:02:22,975 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of permissions
2019-11-08 10:02:22,976 - bootstrapping - INFO - Configuring pgbouncer
2019-11-08 10:02:22,976 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2019-11-08 10:02:23,234 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2019-11-08 10:02:23,234 INFO Included extra file "/etc/supervisor/conf.d/cron.conf" during parsing
2019-11-08 10:02:23,234 INFO Included extra file "/etc/supervisor/conf.d/patroni.conf" during parsing
2019-11-08 10:02:23,234 INFO Included extra file "/etc/supervisor/conf.d/pgq.conf" during parsing
2019-11-08 10:02:23,242 INFO RPC interface 'supervisor' initialized
2019-11-08 10:02:23,242 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2019-11-08 10:02:23,242 INFO supervisord started with pid 1
2019-11-08 10:02:24,246 INFO spawned: 'cron' with pid 36
2019-11-08 10:02:24,248 INFO spawned: 'patroni' with pid 37
2019-11-08 10:02:24,250 INFO spawned: 'pgq' with pid 38
2019-11-08 10:02:24,667 INFO: No PostgreSQL configuration items changed, nothing to reload.
2019-11-08 10:02:24,685 INFO: Lock owner: None; I am my-app-db-0
2019-11-08 10:02:24,710 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

creating directory /home/postgres/pgdata/pgroot/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... 2019-11-08 10:02:25,480 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-11-08 10:02:25,480 INFO success: patroni entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-11-08 10:02:25,480 INFO success: pgq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
ok

Success. You can now start the database server using:

    /usr/lib/postgresql/11/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data -l logfile start

2019-11-08 10:02:25,921 INFO: postmaster pid=67
/var/run/postgresql:5432 - no response
2019-11-08 10:02:25 UTC [67]: [1-1] 5dc53d31.43 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2019-11-08 10:02:25 UTC [67]: [2-1] 5dc53d31.43 0     LOG:  pg_stat_kcache.linux_hz is set to 250
2019-11-08 10:02:25 UTC [67]: [3-1] 5dc53d31.43 0     LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-11-08 10:02:26 UTC [67]: [4-1] 5dc53d31.43 0     LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-11-08 10:02:26 UTC [67]: [5-1] 5dc53d31.43 0     LOG:  redirecting log output to logging collector process
2019-11-08 10:02:26 UTC [67]: [6-1] 5dc53d31.43 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - accepting connections
/var/run/postgresql:5432 - accepting connections
2019-11-08 10:02:27,022 INFO: establishing a new patroni connection to the postgres cluster
2019-11-08 10:02:27,039 INFO: running post_bootstrap
SET
DO
DO
DO
CREATE EXTENSION
NOTICE:  version "1.0" of extension "pg_auth_mon" is already installed
ALTER EXTENSION
GRANT
CREATE EXTENSION
NOTICE:  version "1.1" of extension "pg_cron" is already installed
ALTER EXTENSION
ALTER POLICY
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
REVOKE
GRANT
REVOKE
GRANT
GRANT
CREATE EXTENSION
DO
CREATE TABLE
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
RESET
SET
NOTICE:  schema "zmon_utils" does not exist, skipping
DROP SCHEMA
DO
NOTICE:  language "plpythonu" does not exist, skipping
DROP LANGUAGE
NOTICE:  function plpython_call_handler() does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_inline_handler(internal) does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_validator(oid) does not exist, skipping
DROP FUNCTION
CREATE SCHEMA
GRANT
SET
CREATE TYPE
CREATE FUNCTION
CREATE FUNCTION
GRANT
You are now connected to database "postgres" as user "postgres".
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
NOTICE:  version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
You are now connected to database "template1" as user "postgres".
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
NOTICE:  version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
2019-11-08 10:02:27,765 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2019-11-08 10:02:27,830 INFO: initialized a new cluster
2019-11-08 10:02:37,758 INFO: Lock owner: my-app-db-0; I am my-app-db-0
2019-11-08 10:02:37,769 INFO: Lock owner: my-app-db-0; I am my-app-db-0
2019-11-08 10:02:37,803 INFO: no action.  i am the leader with the lock
2019-11-08 10:02:47,759 INFO: Lock owner: my-app-db-0; I am my-app-db-0
2019-11-08 10:02:47,776 INFO: no action.  i am the leader with the lock
2019-11-08 10:02:57,758 INFO: Lock owner: my-app-db-0; I am my-app-db-0
DeamonMV commented 4 years ago

@FxKu hello.

We had run in the same problems. I tried two version of operator - 1.2.0 and latest With latest version, after cluster was created, I do not see any error in operator logs. but with 1.2.0 - errors are present:

time="2019-11-08T13:16:50Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.233.53.184:5432: connect: connection refused" cluster-name=default/grafana-postgres pkg=cluster

But I had to say that both operator have the same problem - they created not worked services - they do not point on pods.

Look on endpoints.

$ kubectl get endpoints
grafana-postgres            <none>                                                         10m
grafana-postgres-repl       <none>                                                         10m
testgrafana-postgres        <none>                                                         57s
testgrafana-postgres-repl   <none>                                                         57s

testgrafana-postgres - cluster created with 1.2.0 version.

Selectors are not present in service

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2019-11-08T13:16:43Z"
  labels:
    application: spilo
    cluster-name: testgrafana-postgres
    spilo-role: master
    team: testgrafana
  name: testgrafana-postgres
  namespace: default
  resourceVersion: "52400"
  selfLink: /api/v1/namespaces/default/services/testgrafana-postgres
  uid: 01e1e125-022a-11ea-ba71-9600002e4379
spec:
  clusterIP: 10.233.48.124
  ports:
  - name: postgresql
    port: 5432
    protocol: TCP
    targetPort: 5432
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

After i added Selectors by hand, Endpoint show IPs and Ports of pods.(both of pods, master and replica, i guess)

qurname2 commented 4 years ago

@DeamonMV, about adding selector by hand, I came across an interesting and not quite logical, in my opinion, operator behavior - if you add a selector by your hands or change the type of service to a NodePort (for example, for debugging) - the operator after couple of minutes will return type to ClusterIP(LoadBalancer) or will delete adding by your selectors. Did you also have smth like this or only I have such "unique" problems?)

DeamonMV commented 4 years ago

@qurname2 i tried to change tupe of the Service to NodePort, but operator did't change anything back.

And i found where was my problem.

If in this manifest remove this section...

    version: "11"

...cluster will be created, but cluster will be not worked, and you will see those messages - could not connect to PostgreSQL database

Proper minimal manifest is:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
namespace: "default"
metadata:
  name: "acid-psql"
spec:
  postgresql:
    version: "11"
  teamId: "acid"
  volume:
    size: 1Gi
    storageClass: postgres
  numberOfInstances: 2
  users:
    postgres:
      - superuser
      - createdb
    primaryuser:
      - createdb
  databases:
    userdb: primaryuser
FxKu commented 4 years ago

Found out that changing the service type has some issues. So some of your problems could be resolved with #716. For for errors on cluster creation, it must be something different.

FxKu commented 4 years ago

We merged #716, so error messages on changing the service type should not appear anymore. Could you test again @qurname2, @DeamonMV ?

FxKu commented 4 years ago

Will close this one for now as it was reported that things work with the version from master. Reopen if still find the same issues.

Jasstkn commented 3 years ago

@FxKu Hi Felix! Looks like I may have similar problem... I wonder if you have time to take a look into my case. I have a bare metal k8s installation, postgres-operator v1.6.1 in it. I'm trying to deploy this manifest:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: devops-test
  namespace: postgres-operator
spec:
  teamId: "acid"
  volume:
    size: 1Gi
  numberOfInstances: 1
  users:
    zalando:  # database owner
    - superuser
    - createdb
  databases:
    postgres: postgres  # dbname: owner
  postgresql:
    version: "12"
    parameters:
      shared_buffers: "64MB"
      max_connections: "50"
      log_statement: "all"
      log_directory: "/var/log/postgresql"
  resources:
    requests:
      cpu: 250m
      memory: 512Mi
    limits:
      cpu: 250m
      memory: 768Mi
  patroni:
    initdb:
      encoding: "UTF8"
      locale: "en_US.UTF-8"
      data-checksums: "true"
    pg_hba:
      - local all all trust
      - host replication standby all md5
      - host all all all md5
  allowedSourceRanges:
  - 0.0.0.0/0

But I've got connection refused problem:

could not connect to Postgres database: dial tcp 127.0.0.1:5432: connect: connection refused

I wonder why operator tries to connect using localhost? Because I've repeated installation steps with the same configs on GKE, and it works ok... Thank you in advance for your help.