zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.35k stars 980 forks source link

Questions about SSL certs with Postgres #1073

Open kannanvr opened 4 years ago

kannanvr commented 4 years ago

Hi All, I need a help to enable the SSL for my postgres cluster I am using zalando postgres My application is running as a pod and accessing the postgresDB I have deployed postgresDB from zalando. My configuration is 1Master + 1 Slave . I mean number of replication is 2 I have added the below rule on pg_hba.conf through CRD

hostssl all all 10.233.0.0/15 md5 clientcert=1

My application pod and postgres pod are on same IP range (10.233.0.0/15) When the slave postgres trying to sync with master postgres, I am getting the below error on Replica...

2020-07-23 11:28:41,944 INFO: Lock owner: postgres-cluster-1; I am postgres-cluster-0
2020-07-23 11:28:41,980 INFO: running pg_rewind from postgres-cluster-1
2020-07-23 11:28:42,024 ERROR: Exception when working with leader
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql/rewind.py", line 59, in check_leader_is_not_in_recovery
    with get_connection_cursor(connect_timeout=3, options='-c statement_timeout=2000', **kwargs) as cur:
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
    with psycopg2.connect(**kwargs) as conn:
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL:  connection requires a valid client certificate
FATAL:  pg_hba.conf rejects connection for host "10.233.65.89", user "postgres", database "postgres", SSL off

How others are enabling the SSL certs for app pod... In kubernetes Pod IP cant be choosen by user. Either flannel or calico NW plugin will choose it based on the range choosen by the user at the time of installation How to avoid replica pod to sync with master without any problem when SSL certs were enabled? Request to provide the guidance

ReSearchITEng commented 4 years ago

hi,

It seems you want MTLS (as you chose clientcert=1). From there, the error: psycopg2.OperationalError: FATAL: connection requires a valid client certificate Next, it tries without ssl, and it seems it's rejected: FATAL: pg_hba.conf rejects connection for host "10.233.65.89", user "postgres", database "postgres", SSL off of

That is strange, as the pg_hba.conf is usually automatically updated inside container to allow local connections. Can you paste here the pg_hba.conf inside both -0 and -1 ? I suggest you update your cluster request and add one more line, especially for the postgres user, in all dbs, to use ssl (user/pass), but not mtls.

    - hostssl all      postgres                all md5
    - hostssl all all 10.233.0.0/15 md5 clientcert=1
kannanvr commented 4 years ago

@ReSearchITEng , Thanks for your Reply. I think i understood the reason. Its an issue to enable the MTLS feature on zalando operator. Scenario To reproduce this issue:

  1. Bring up postgres with 1 Master and 1 slave with the below pg_hba.conf
           hostssl all all 10.233.0.0/15 md5 clientcert=1
    1. Both Master(M1) and Slave(S1) Postgres comes up normally without any issue
    2. Bringdown the Master Postgres(M1).
    3. Slave postgres become Master (M2)
      1. Old Master postgres(M1) become now slave (S2)

Now we will get a authentication error as below

psycopg2.OperationalError: FATAL:  connection requires a valid client certificate
FATAL:  pg_hba.conf rejects connection for host "10.233.65.89", user "postgres", database "postgres", SSL off
  1. Now bring down the Master (M2)
  2. S2 become new master(M3)
  3. M2 become S3

Now we won't observe the authentication error. Both Master and slave comes up normally

So Here the problem is whenever first master become slave, we will observe this authentication issue. WHen i check the pg_hba.conf file, both master and slave postgres become same at all the time

But I have observed that there is one setting is not correct. I suspect this might lead into the issue

In the file postgresql.conf at the path /home/postgres/pgdata/pgroot/data/postgresql.conf

# recovery.conf
primary_conninfo = 'user=standby passfile=/run/postgresql/pgpass host=10.233.66.120 port=5432 sslmode=prefer application_name=postgres-cluster-1'
primary_slot_name = 'postgres_cluster_1'
recovery_target = ''
recovery_target_lsn = ''
recovery_target_name = ''
recovery_target_time = ''
recovery_target_timeline = 'latest'
recovery_target_xid = ''

Incase when there is an authentication issue , I could see that primary_conninfo and primary_slot_name is not there on slave postgres.

This information is initially populated to the Slave postgres. Thats the reason it could able to connect with master with ssl mode. But this info is not there on master. Whenever master become slave postgres, since this param is not set, it is trying to connect with recent master without SSL. I think, We need to enable this option on both Master and slave. This option is mainly used by slave and ignored by master.

Kindly let me know if you need any further information in this regard. It would be good if we fix this issue asap.

Thanks, Kannan V

ReSearchITEng commented 4 years ago

Hi, IMO that is strange, as Master/Slave switch it should have had issues also in non-mtls setup, but I am not aware of such issues there. Can you confirm that it's only in MTLS for you? Once you have answer, @CyberDem0n from splio project could confirm how primary_conninfo and primary_slot_name should look in your postgresql.conf

If it's only in mtls setup, might be something like access issue in first place. Please confirm you've added: - hostssl all postgres all md5 , and please paste the hba conf files of each pod

kannanvr commented 4 years ago

@ReSearchITEng , After adding the rule hostssl all postgres all md5 , I am not facing the authentication issue. This issue we are facing only when i am enabling MTLS below is the master pg_hba.conf

# It will be overwritten by Patroni!
hostssl all      postgres                all md5
hostssl all all 10.233.0.0/15 md5 clientcert=1
local   all             all                                   trust
hostssl all             +zalandos    127.0.0.1/32       pam
host    all             all                127.0.0.1/32       md5
hostssl all             +zalandos    ::1/128            pam
host    all             all                ::1/128            md5
hostssl replication     standby all                md5
hostnossl all           all                all                reject
hostssl all             +zalandos    all                pam
hostssl all             all                all                md5

Below is the slave postgres pg_hba.conf

# Do not edit this file manually!
# It will be overwritten by Patroni!
hostssl all      postgres                all md5
hostssl all all 10.233.0.0/15 md5 clientcert=1
local   all             all                                   trust
hostssl all             +zalandos    127.0.0.1/32       pam
host    all             all                127.0.0.1/32       md5
hostssl all             +zalandos    ::1/128            pam
host    all             all                ::1/128            md5
hostssl replication     standby all                md5
hostnossl all           all                all                reject
hostssl all             +zalandos    all                pam
hostssl all             all                all                md5
kannanvr commented 4 years ago

@ReSearchITEng @CyberDem0n , shall we check on spilo for the access issue ?

ReSearchITEng commented 4 years ago

@ReSearchITEng @CyberDem0n , shall we check on spilo for the access issue ?

yes, please do the settings in the pg_hba section of the cluster request as explained, test, and if still now working please enter both master and a slave pod and paste here the full pg_hba.conf file contents.

kannanvr commented 4 years ago

@ReSearchITEng , I have tested and pasted the complete pg_ hba.conf on my above comment. Do you want any further information regarding this issue

ReSearchITEng commented 4 years ago

Please enter both master and a slave pod and paste here the full pg_hba.conf file contents.

johndiego commented 3 years ago

I have same problem how i can solved this?

vitobotta commented 3 years ago

I'm also having this problem with some apps, not all, and am confused as to what exactly I have to do to fix. Any clarification?