zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.3k stars 974 forks source link

Master not reachable when "enableConnectionPooler"=True #1304

Closed yfoelling closed 3 years ago

yfoelling commented 3 years ago

Hey,

i have just started using the postgres operator of zalando. I am using the provided helm charts for the operator and the ui in the version of the v1.6.0 tag (586b46d0).

All the deployments seem to work fine. If i build a Cluster, there seems to be no issues at all. But as soon as i want to enable the connection pooler, that status in the ui is stuck at "Waiting for master to become available". I don't really find any relevant log messages about the issue.

I tried a bunch of different combinations, with other postgres versions, less/more replicas and also tried to create the postgres via kubectl directly. But in all cases the same issue occurs as soon as i enable the connection pooler.

I am using Helm v3.4.2, with the charts provided in this repository. I reseted all configuration that i made in the values.yaml's, now the only thing changed to the original is the (listening-)namespace.

The Kubernetes Cluster is version v1.19.4 and my kubectl 1.18.3. The k8s cluster contains of multiple Nodes on virtual-machines via rke, the storage is provided via longhorn.

here is my values.yaml for the operator helm chart:

image:
  registry: registry.opensource.zalan.do
  repository: acid/postgres-operator
  tag: v1.6.0
  pullPolicy: "IfNotPresent"

# Optionally specify an array of imagePullSecrets.
# Secrets must be manually created in the namespace.
# ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
# imagePullSecrets:
  # - name: myRegistryKeySecretName

podAnnotations: {}
podLabels: {}

configTarget: "ConfigMap"

# JSON logging format
enableJsonLogging: false

# general configuration parameters
configGeneral:
  # choose if deployment creates/updates CRDs with OpenAPIV3Validation
  enable_crd_validation: "true"
  # update only the statefulsets without immediately doing the rolling update
  enable_lazy_spilo_upgrade: "false"
  # set the PGVERSION env var instead of providing the version via postgresql.bin_dir in SPILO_CONFIGURATION
  enable_pgversion_env_var: "true"
  # start any new database pod without limitations on shm memory
  enable_shm_volume: "true"
  # enables backwards compatible path between Spilo 12 and Spilo 13 images
  enable_spilo_wal_path_compat: "false"
  # etcd connection string for Patroni. Empty uses K8s-native DCS.
  etcd_host: ""
  # Select if setup uses endpoints (default), or configmaps to manage leader (DCS=k8s)
  # kubernetes_use_configmaps: "false"
  # Spilo docker image
  docker_image: registry.opensource.zalan.do/acid/spilo-13:2.0-p2
  # max number of instances in Postgres cluster. -1 = no limit
  min_instances: "-1"
  # min number of instances in Postgres cluster. -1 = no limit
  max_instances: "-1"
  # period between consecutive repair requests
  repair_period: 5m
  # period between consecutive sync requests
  resync_period: 30m
  # can prevent certain cases of memory overcommitment
  # set_memory_request_to_limit: "false"

  # map of sidecar names to docker images
  # sidecar_docker_images: ""

  # number of routines the operator spawns to process requests concurrently
  workers: "8"

# parameters describing Postgres users
configUsers:
  # postgres username used for replication between instances
  replication_username: standby
  # postgres superuser name to be created by initdb
  super_username: postgres

configKubernetes:
  # default DNS domain of K8s cluster where operator is running
  cluster_domain: cluster.local
  # additional labels assigned to the cluster objects
  cluster_labels: application:spilo
  # label assigned to Kubernetes objects created by the operator
  cluster_name_label: cluster-name
  # annotations attached to each database pod
  # custom_pod_annotations: "keya:valuea,keyb:valueb"

  # key name for annotation that compares manifest value with current date
  # delete_annotation_date_key: "delete-date"

  # key name for annotation that compares manifest value with cluster name
  # delete_annotation_name_key: "delete-clustername"

  # list of annotations propagated from cluster manifest to statefulset and deployment
  # downscaler_annotations: "deployment-time,downscaler/*"

  # enables initContainers to run actions before Spilo is started
  enable_init_containers: "true"
  # toggles pod anti affinity on the Postgres pods
  enable_pod_antiaffinity: "false"
  # toggles PDB to set to MinAvailabe 0 or 1
  enable_pod_disruption_budget: "true"
  # enables sidecar containers to run alongside Spilo in the same pod
  enable_sidecars: "true"
  # namespaced name of the secret containing infrastructure roles names and passwords
  # infrastructure_roles_secret_name: postgresql-infrastructure-roles

  # list of annotation keys that can be inherited from the cluster manifest
  # inherited_annotations: owned-by

  # list of label keys that can be inherited from the cluster manifest
  # inherited_labels: application,environment

  # timeout for successful migration of master pods from unschedulable node
  # master_pod_move_timeout: 20m

  # set of labels that a running and active node should possess to be considered ready
  # node_readiness_label: ""

  # namespaced name of the secret containing the OAuth2 token to pass to the teams API
  # oauth_token_secret_name: postgresql-operator

  # defines the template for PDB (Pod Disruption Budget) names
  pdb_name_format: "postgres-{cluster}-pdb"
  # override topology key for pod anti affinity
  pod_antiaffinity_topology_key: "kubernetes.io/hostname"
  # namespaced name of the ConfigMap with environment variables to populate on every pod
  # pod_environment_configmap: "default/my-custom-config"
  # name of the Secret (in cluster namespace) with environment variables to populate on every pod
  # pod_environment_secret: "my-custom-secret"

  # specify the pod management policy of stateful sets of Postgres clusters
  pod_management_policy: "ordered_ready"
  # label assigned to the Postgres pods (and services/endpoints)
  pod_role_label: spilo-role
  # service account definition as JSON/YAML string to be used by postgres cluster pods
  # pod_service_account_definition: ""

  # role binding definition as JSON/YAML string to be used by pod service account
  # pod_service_account_role_binding_definition: ""

  # Postgres pods are terminated forcefully after this timeout
  pod_terminate_grace_period: 5m
  # template for database user secrets generated by the operator
  secret_name_template: "{username}.{cluster}.credentials.{tprkind}.{tprgroup}"
  # set user and group for the spilo container (required to run Spilo as non-root process)
  # spilo_runasuser: "101"
  # spilo_runasgroup: "103"
  # group ID with write-access to volumes (required to run Spilo as non-root process)
  # spilo_fsgroup: "103"

  # whether the Spilo container should run in privileged mode
  spilo_privileged: "false"
  # storage resize strategy, available options are: ebs, pvc, off
  storage_resize_mode: pvc
  # operator watches for postgres objects in the given namespace
  watched_namespace: "postgres-operator"  # listen to all namespaces

# configure resource requests for the Postgres pods
configPostgresPodResources:
  # CPU limits for the postgres containers
  default_cpu_limit: "1"
  # CPU request value for the postgres containers
  default_cpu_request: 100m
  # memory limits for the postgres containers
  default_memory_limit: 500Mi
  # memory request value for the postgres containers
  default_memory_request: 100Mi
  # hard CPU minimum required to properly run a Postgres cluster
  min_cpu_limit: 250m
  # hard memory minimum required to properly run a Postgres cluster
  min_memory_limit: 250Mi

# timeouts related to some operator actions
configTimeouts:
  # timeout when waiting for the Postgres pods to be deleted
  pod_deletion_wait_timeout: 10m
  # timeout when waiting for pod role and cluster labels
  pod_label_wait_timeout: 10m
  # interval between consecutive attempts waiting for postgresql CRD to be created
  ready_wait_interval: 3s
  # timeout for the complete postgres CRD creation
  ready_wait_timeout: 30s
  # interval to wait between consecutive attempts to check for some K8s resources
  resource_check_interval: 3s
  # timeout when waiting for the presence of a certain K8s resource (e.g. Sts, PDB)
  resource_check_timeout: 10m

# configure behavior of load balancers
configLoadBalancer:
  # DNS zone for cluster DNS name when load balancer is configured for cluster
  db_hosted_zone: db.example.com
  # annotations to apply to service when load balancing is enabled
  # custom_service_annotations: "keyx:valuez,keya:valuea"

  # toggles service type load balancer pointing to the master pod of the cluster
  enable_master_load_balancer: "false"
  # toggles service type load balancer pointing to the replica pod of the cluster
  enable_replica_load_balancer: "false"
  # define external traffic policy for the load balancer
  external_traffic_policy: "Cluster"
  # defines the DNS name string template for the master load balancer cluster
  master_dns_name_format: '{cluster}.{team}.{hostedzone}'
  # defines the DNS name string template for the replica load balancer cluster
  replica_dns_name_format: '{cluster}-repl.{team}.{hostedzone}'

# options to aid debugging of the operator itself
configDebug:
  # toggles verbose debug logs from the operator
  debug_logging: "true"
  # toggles operator functionality that require access to the postgres database
  enable_database_access: "true"

# parameters affecting logging and REST API listener
configLoggingRestApi:
  # REST API listener listens to this port
  api_port: "8080"
  # number of entries in the cluster history ring buffer
  cluster_history_entries: "1000"
  # number of lines in the ring buffer used to store cluster logs
  ring_log_lines: "100"

# configure interaction with non-Kubernetes objects from AWS or GCP
configAwsOrGcp:
  # Additional Secret (aws or gcp credentials) to mount in the pod
  # additional_secret_mount: "some-secret-name"

  # Path to mount the above Secret in the filesystem of the container(s)
  # additional_secret_mount_path: "/some/dir"

  # AWS region used to store ESB volumes
  aws_region: eu-central-1

  # enable automatic migration on AWS from gp2 to gp3 volumes
  enable_ebs_gp3_migration: "false"
  # defines maximum volume size in GB until which auto migration happens
  # enable_ebs_gp3_migration_max_size: "1000"

  # GCP credentials for setting the GOOGLE_APPLICATION_CREDNETIALS environment variable
  # gcp_credentials: ""

  # AWS IAM role to supply in the iam.amazonaws.com/role annotation of Postgres pods
  # kube_iam_role: ""

  # S3 bucket to use for shipping postgres daily logs
  # log_s3_bucket: ""

  # S3 bucket to use for shipping WAL segments with WAL-E
  # wal_s3_bucket: ""

  # GCS bucket to use for shipping WAL segments with WAL-E
  # wal_gs_bucket: ""

# configure K8s cron job managed by the operator
configLogicalBackup:
  # image for pods of the logical backup job (example runs pg_dumpall)
  logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup:v1.6.0"
  # path of google cloud service account json file
  # logical_backup_google_application_credentials: ""

  # prefix for the backup job name
  logical_backup_job_prefix: "logical-backup-"
  # storage provider - either "s3" or "gcs"
  logical_backup_provider: "s3"
  # S3 Access Key ID
  logical_backup_s3_access_key_id: ""
  # S3 bucket to store backup results
  logical_backup_s3_bucket: "my-bucket-url"
  # S3 endpoint url when not using AWS
  logical_backup_s3_endpoint: ""
  # S3 region of bucket
  logical_backup_s3_region: ""
  # S3 Secret Access Key
  logical_backup_s3_secret_access_key: ""
  # S3 server side encryption
  logical_backup_s3_sse: "AES256"
  # backup schedule in the cron format
  logical_backup_schedule: "30 00 * * *"

# automate creation of human users with teams API service
configTeamsApi:
  # team_admin_role will have the rights to grant roles coming from PG manifests
  # enable_admin_role_for_users: "true"

  # operator watches for PostgresTeam CRs to assign additional teams and members to clusters
  enable_postgres_team_crd: "false"
  # toogle to create additional superuser teams from PostgresTeam CRs
  # enable_postgres_team_crd_superusers: "false"

  # toggle to grant superuser to team members created from the Teams API
  # enable_team_superuser: "false"

  # toggles usage of the Teams API by the operator
  enable_teams_api: "false"
  # should contain a URL to use for authentication (username and token)
  # pam_configuration: https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees

  # operator will add all team member roles to this group and add a pg_hba line
  # pam_role_name: zalandos

  # List of teams which members need the superuser role in each Postgres cluster
  # postgres_superuser_teams: "postgres_superusers"

  # List of roles that cannot be overwritten by an application, team or infrastructure role
  # protected_role_names: "admin"

  # role name to grant to team members created from the Teams API
  # team_admin_role: "admin"

  # postgres config parameters to apply to each team member role
  # team_api_role_configuration: "log_statement:all"

  # URL of the Teams API service
  # teams_api_url: http://fake-teams-api.default.svc.cluster.local

# configure connection pooler deployment created by the operator
configConnectionPooler:
  # db schema to install lookup function into
  connection_pooler_schema: "pooler"
  # db user for pooler to use
  connection_pooler_user: "pooler"
  # docker image
  connection_pooler_image: "registry.opensource.zalan.do/acid/pgbouncer:master-9"
  # max db connections the pooler should hold
  connection_pooler_max_db_connections: "60"
  # default pooling mode
  connection_pooler_mode: "transaction"
  # number of pooler instances
  connection_pooler_number_of_instances: "2"
  # default resources
  connection_pooler_default_cpu_request: 500m
  connection_pooler_default_memory_request: 100Mi
  connection_pooler_default_cpu_limit: "1"
  connection_pooler_default_memory_limit: 100Mi

rbac:
  # Specifies whether RBAC resources should be created
  create: true

crd:
  # Specifies whether custom resource definitions should be created
  # When using helm3, this is ignored; instead use "--skip-crds" to skip.
  create: true

serviceAccount:
  # Specifies whether a ServiceAccount should be created
  create: true
  # The name of the ServiceAccount to use.
  # If not set and create is true, a name is generated using the fullname template
  name:

podServiceAccount:
  # The name of the ServiceAccount to be used by postgres cluster pods
  # If not set a name is generated using the fullname template and "-pod" suffix
  name: "postgres-pod"

# priority class for operator pod
priorityClassName: ""

# priority class for database pods
podPriorityClassName: ""

resources:
  limits:
    cpu: 500m
    memory: 500Mi
  requests:
    cpu: 100m
    memory: 250Mi

# Affinity for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}

# Tolerations for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations: []

# Node labels for pod assignment
# Ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector: {}

controllerID:
  # Specifies whether a controller ID should be defined for the operator
  # Note, all postgres manifest must then contain the following annotation to be found by this operator
  # "acid.zalan.do/controller": <controller-ID-of-the-operator>
  create: false
  # The name of the controller ID to use.
  # If not set and create is true, a name is generated using the fullname template
  name:

Also the values.yaml for the ui helm chart:

# Default values for postgres-operator-ui.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

# configure ui image
image:
  registry: registry.opensource.zalan.do
  repository: acid/postgres-operator-ui
  tag: v1.6.0
  pullPolicy: "IfNotPresent"

# Optionally specify an array of imagePullSecrets.
# Secrets must be manually created in the namespace.
# ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
# imagePullSecrets:
#   - name: 

rbac:
  # Specifies whether RBAC resources should be created
  create: true

serviceAccount:
  # Specifies whether a ServiceAccount should be created
  create: true
  # The name of the ServiceAccount to use.
  # If not set and create is true, a name is generated using the fullname template
  name:

# configure UI pod resources
resources:
  limits:
    cpu: 200m
    memory: 200Mi
  requests:
    cpu: 100m
    memory: 100Mi

# configure UI ENVs
envs:
  # IMPORTANT: While operator chart and UI chart are idendependent, this is the interface between
  # UI and operator API. Insert the service name of the operator API here!
  operatorApiUrl: "http://postgres-operator:8080"
  operatorClusterNameLabel: "cluster-name"
  resourcesVisible: "False"
  targetNamespace: "postgres-operator"

# configure UI service
service:
  type: "ClusterIP"
  port: "80"
  # If the type of the service is NodePort a port can be specified using the nodePort field
  # If the nodePort field is not specified, or if it has no value, then a random port is used
  # notePort: 32521

# configure UI ingress. If needed: "enabled: true"
ingress:
  enabled: false
  annotations: {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  hosts:
    - host: ui.example.org
      paths: [""]
  tls: []
  #  - secretName: ui-tls
  #    hosts:
  #      - ui.exmaple.org

I would really appreciate any help. If you need any additional info, feel free to ask :-)

thanks in advance.

FxKu commented 3 years ago

Sounds like #1297. It's only the UI displaying it wrong. If you check the pod, the master label should be there.

yfoelling commented 3 years ago

Thanks you very much for your fast answer. With the master branch its works fine!

Maybe tag it as an release, so people can see it better. Or is it safe to always use the master branch?

FxKu commented 3 years ago

We published a new release with the fix. Closing this issue then. Better not always use the master btw ;)