zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.35k stars 979 forks source link

synchronous_node_count is not updated in cluster settings #2552

Closed cstohr1 closed 8 months ago

cstohr1 commented 8 months ago

https://github.com/zalando/postgres-operator/pull/1484/ introduced support for synchronous_node_count in the CRD, however I dont see setting this to >1 actually being reflected in the cluster. I deployed the latest postgres-operator image and this CR using ghcr.io/zalando/spilo-15:3.0-p1:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: postgres-cluster
spec:
  allowedSourceRanges:
    - 127.0.0.1/32
  databases:
    accounts: accounts
  dockerImage: ghcr.io/zalando/spilo-15:3.0-p1
  enableConnectionPooler: false
  enableLogicalBackup: false
  enableMasterLoadBalancer: false
  enableMasterPoolerLoadBalancer: false
  enableReplicaConnectionPooler: false
  enableReplicaLoadBalancer: false
  enableReplicaPoolerLoadBalancer: false
  enableShmVolume: true
  initContainers:
    - command:
        - /bin/date
      image: busybox:1.35
      name: date
  numberOfInstances: 3
  patroni:
    initdb:
      data-checksums: 'true'
      encoding: UTF8
      locale: en_US.UTF-8
    loop_wait: 10
    maximum_lag_on_failover: 33554432
    retry_timeout: 10
    synchronous_mode: true
    synchronous_mode_strict: true
    synchronous_node_count: 2
    ttl: 30
  postgresql:
    parameters:
      archive_timeout: '300'
      autovacuum_vacuum_cost_delay: 10ms
      autovacuum_vacuum_cost_limit: '10000'
      log_filename: postgresql-%M
      log_rotation_age: '1'
      log_statement: none
      max_connections: '200'
      shared_buffers: 1024MB
      synchronous_commit: remote_apply
      synchronous_standby_names: '*'
    version: '11'
  resources:
    limits:
      cpu: 3000m
      memory: 4Gi
    requests:
      cpu: 1000m
      memory: 4Gi
  teamId: knime
  users:
    accounts:
      - createdb
  volume:
    size: 30Gi

notably setting synchronous_node_count=2 and synchronous_mode=true. changes to synchronous_node_count are shown as an update event in the postgres-operator logs, but looking at the actual config on the postgres leader pod the setting is missing completely:

postgres@postgres-cluster-1:~$ patronictl show-config
failsafe_mode: false
loop_wait: 10
maximum_lag_on_failover: 33554432
pg_hba:
- hostssl all all 0.0.0.0/0 md5
- host    all all 0.0.0.0/0 md5
postgresql:
  parameters:
    archive_mode: 'on'
    archive_timeout: '300'
    autovacuum_analyze_scale_factor: 0.02
    autovacuum_max_workers: 5
    autovacuum_vacuum_cost_delay: 10ms
    autovacuum_vacuum_cost_limit: '10000'
    autovacuum_vacuum_scale_factor: 0.05
    checkpoint_completion_target: 0.9
    hot_standby: 'on'
    log_autovacuum_min_duration: 0
    log_checkpoints: 'on'
    log_connections: 'on'
    log_disconnections: 'on'
    log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h '
    log_lock_waits: 'on'
    log_min_duration_statement: 500
    log_statement: none
    log_temp_files: 0
    max_connections: '200'
    max_replication_slots: 10
    max_wal_senders: 10
    synchronous_commit: remote_apply
    synchronous_standby_names: '*'
    tcp_keepalives_idle: 900
    tcp_keepalives_interval: 100
    track_functions: all
    wal_keep_segments: 8
    wal_level: hot_standby
    wal_log_hints: 'on'
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
synchronous_mode: true
synchronous_mode_strict: true
ttl: 30

If I manually add synchronous_node_count: 2 using patronictl edit-config on the leader pod I can see it taking effect by checking psql -U postgres -c 'select application_name, sync_state from pg_stat_replication;'.

cstohr1 commented 8 months ago

I am not familiar with the code of the postgres-operator, but I noticed that in checkAndSetGlobalPostgreSQLConfiguration settings like SynchronousMode are compared from effective config to desired config and updated if they are out of sync, but SynchronousNodeCount is missing completely: https://github.com/zalando/postgres-operator/blob/3fb3b3409463fab52cadfde5dea0d261d1f534c9/pkg/cluster/sync.go#L620

FxKu commented 8 months ago

Sounds like an oversight then. You like to create a PR for it?

cstohr1 commented 8 months ago

Opened a PR, not sure I'm doing this right because this is the first time I'm contributing to an external project on github :-) https://github.com/zalando/postgres-operator/pull/2558

tested this change locally in minikube

FxKu commented 8 months ago

Thanks. Merged it. Happy to hear about your first contribution to another OS project :smiley: