reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.
https://www.kubegres.io
Apache License 2.0
1.32k stars 74 forks source link

Bug: Enabling ssl in postgres configuration results in regular authentication failure error in logs #85

Open movitto opened 2 years ago

movitto commented 2 years ago

Hello again! This issue relates to #81 where I detail setup instructions for enabling ssl-only communication to the postgresql server. Where the steps there are executed, everything works correctly and clients can successfully connect via ssl but the following error starts to appear in the logs for all pods (primary and replicas):

2022-01-10 01:34:19.615 GMT [420] FATAL:  connection requires a valid client certificate
2022-01-10 01:34:19.617 GMT [421] FATAL:  no pg_hba.conf entry for host "<IP>", user "postgres", database "postgres", no encryption

The IP being that of the pod which postgres is running (and that which owns the corresponding log file). This occurs even if no external clients connect to the postgres database so I'm thinking that there is some internal process regularily trying to connect to postgres on the local pod, using the IP as the host, and is failing as non-ssl connections are disable.

After spending several hours on this I could not track the issue down.

Under "normal" circumstances, when non-ssl connections are permitted, we see the following in the logs instead of the above error:

2022-01-10 00:25:31.623 GMT [142] LOG:  could not receive data from client: Connection reset by peer

It's frustrating as we now need to re-enable non-ssl connections as otherwise the logs will get polluted and unweildy. Could you please advise? Thank you.

movitto commented 2 years ago

Just tried debugging for a few hours more before having to move to another priority task. I disabled as much as I can inside postgres (autovacuming, stats, etc) and set 'replicas' to 1 so that only the primary server would be deployed. At the end of the day 'ps aux' turns up the following:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres       1  0.1  1.3 213268 28164 ?        Ss   02:32   0:00 postgres -c config_file=/etc/postgres.conf -c hba_file=/etc/pg_hba.conf
postgres      67  0.0  0.1 213268  4060 ?        Ss   02:32   0:00 postgres: checkpointer 
postgres      68  0.0  0.2 213412  5812 ?        Ss   02:32   0:00 postgres: background writer 
postgres      69  0.0  0.4 213268 10032 ?        Ss   02:32   0:00 postgres: walwriter 
postgres      70  0.0  0.2  67980  5016 ?        Ss   02:32   0:00 postgres: stats collector 
postgres      71  0.0  0.3 213816  6784 ?        Ss   02:32   0:00 postgres: logical replication launcher 
root          88  0.0  0.0   2420   520 pts/0    Ss   02:32   0:00 sh
root         568  0.0  0.1   9700  3120 pts/0    R+   02:33   0:00 ps aux

Yet the error still appears in the logs. I can't imagine there is anything else that can be disabled on postgres side (and there are no other process running locally) so I'm guessing kubegres is trying to connect to the serverfrom another container in the pod.

I could be mistaken but as mentioned I can't dedicate any more time to this. Any insight would be highly appreciated!

rahul9595 commented 2 years ago

Hi @movitto and @alex-arica I am trying to setup SSL authentication with kubegress and I have followed the steps provided in #81 but unfortunately I am not able authenticate using SSL. Here are the files I am using :

postgres-conf.yaml :

apiVersion: v1
kind: ConfigMap
metadata:
  name: mypostgres-conf
  namespace: default

data:

  primary_init_script.sh: |
  ⦙ #!/bin/bash
  ⦙ set -e

  ⦙ # This script assumes that the env-var $POSTGRES_MY_DB_PASSWORD contains the password of the custom user to create.
  ⦙ # You can add any env-var in your Kubegres resource config YAML.

  ⦙ dt=$(date '+%d/%m/%Y %H:%M:%S');
  ⦙ echo "$dt - Running init script the 1st time Primary PostgreSql container is created...";

  ⦙ customDatabaseName="keycloak"
  ⦙ customUserName="wsi"

  ⦙ echo "$dt - Running: psql -v ON_ERROR_STOP=1 --username $POSTGRES_USER --dbname $POSTGRES_DB ...";

  ⦙ psql -v ON_ERROR_STOP=1 -h /var/run/postgresql --username "postgres" --dbname "postgres" <<-EOSQL
  ⦙ CREATE DATABASE $customDatabaseName;
  ⦙ CREATE USER $customUserName WITH PASSWORD 'wsi';
  ⦙ GRANT ALL PRIVILEGES ON DATABASE "$customDatabaseName" to $customUserName;
  ⦙ EOSQL

  ⦙ echo "$dt - Init script is completed";

  postgres.conf: |
  ⦙ # SSL
  ⦙ ssl = on
  ⦙ ssl_cert_file = '/mypostgres/.postgresql/postgres.config.base.sq.pem'
  ⦙ ssl_key_file = '/mypostgres/.postgresql/postgres.config.base.sq-key.pem'
  ⦙ ssl_ca_file = '/mypostgres/.postgresql/sq-root-ca-bundle.pem'
··
  pg_hba.conf: |
  ⦙ local all all md5
  ⦙ host replication replication all md5
  ⦙ hostssl all all all md5 clientcert=verify-ca
  ⦙ hostssl keycloak wsi all scram-sha-256
  ⦙ hostssl all all 0.0.0.0/0 reject 

postgres.yaml

apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: mypostgres
  namespace: default

spec:

  ⦙replicas: 1
  ⦙image: postgres:14.1
··⦙
  ⦙database:
  ⦙ ⦙ size: 200Mi
··
  ⦙customConfig: mypostgres-conf····
··
  ⦙securityContext:
  ⦙ ⦙ fsGroup: 999

  ⦙volume:
  ⦙ ⦙ volumeMounts:
  ⦙ ⦙ ⦙ - mountPath: "/mypostgres/.postgresql"
  ⦙ ⦙ ⦙ ⦙ name: postgresql-ssl-secret
  ⦙ ⦙ ⦙ ⦙ readOnly: true
  ⦙ ⦙ volumes:
  ⦙ ⦙ ⦙ - name: postgresql-ssl-secret
  ⦙ ⦙ ⦙ ⦙ secret:
  ⦙ ⦙ ⦙ ⦙ ⦙ secretName: postgresql-ssl-secret
  ⦙ ⦙ ⦙ ⦙ ⦙ defaultMode: 0400

  ⦙env:
  ⦙ ⦙ - name: POSTGRES_PASSWORD
  ⦙ ⦙ ⦙ valueFrom:
  ⦙ ⦙ ⦙ ⦙ secretKeyRef:
  ⦙ ⦙ ⦙ ⦙ ⦙ ⦙ name: mypostgres-secret
  ⦙ ⦙ ⦙ ⦙ ⦙ ⦙ key: superUserPassword

  ⦙ ⦙ - name: POSTGRES_REPLICATION_PASSWORD
  ⦙ ⦙ ⦙ valueFrom:
  ⦙ ⦙ ⦙ ⦙ ⦙secretKeyRef:
  ⦙ ⦙ ⦙ ⦙ ⦙ ⦙ name: mypostgres-secret
  ⦙ ⦙ ⦙ ⦙ ⦙ ⦙ key: replicationUserPassword

  ⦙ ⦙ - name: POSTGRES_MY_DB_PASSWORD
  ⦙ ⦙ ⦙ valueFrom:
  ⦙ ⦙ ⦙ ⦙ ⦙secretKeyRef:
  ⦙ ⦙ ⦙ ⦙ ⦙ ⦙ name: mypostgres-secret
  ⦙ ⦙ ⦙ ⦙ ⦙ ⦙ key: myDbUserPassword

SSL is being turned on yet when I run 'psql -U postgres' inside the postgres container it prompts me for a password rather than authenticating it with the client certificate. Could I please get some help on this. Thank you.

movitto commented 2 years ago

@rahul9595 SSL encryption is used to secure the communication channel. Password authentication is still enforced regardless of what communication channel you use. If your goal is to disable the password prompt consider creating a pgpass password file

Edit: Also you can set 'trust' for certain access points in pg_hba.conf. Be careful with this though, you don't want to trust all connections coming in over the internet for example

movitto commented 2 years ago

@alex-arica regarding the original question, I'm seeing a continuous stream of "could not receive data from client: Connection reset by peer" and every so often "unexpected EOF on client connection with an open transaction" under the "normal" edge case where non-ssl connections are permitted. Could you provide any insights as to any kubegres processes that try to establish a connection with the primary postgres server? Thank you.

2022-01-11 15:02:31.625 GMT [151083] LOG:  could not receive data from client: Connection reset by peer
2022-01-11 15:15:58.322 GMT [152268] LOG:  could not receive data from client: Connection reset by peer
2022-01-11 15:15:58.326 GMT [152266] LOG:  could not receive data from client: Connection reset by peer
2022-01-11 15:15:58.326 GMT [152266] LOG:  unexpected EOF on client connection with an open transaction
2022-01-11 15:17:33.104 GMT [152387] LOG:  could not receive data from client: Connection reset by peer
2022-01-11 15:17:33.104 GMT [152387] LOG:  unexpected EOF on client connection with an open transaction
2022-01-11 15:17:33.108 GMT [152410] LOG:  could not receive data from client: Connection reset by peer
...
rahul9595 commented 2 years ago

Hi @movitto @alex-arica thanks for your quick response I understand that password will be enforced unless the password prompt is disabled but there is no indication that SSL is being used at all. Is there a way to test SSL connection inside the postgreSQL container ? I tried the following psql -p 5432 -h localhost but I get the following error


connection to server at "localhost" (::1), port 5432 failed: FATAL:  no pg_hba.conf entry for host "::1", user "root", database "root", no encryption```
rahul9595 commented 2 years ago

Even if you log in with the password and SSL is enabled there should be an output of something of this regard :

[postgres@node1 ~]$ psql test -h 10.0.3.200
Password for user postgres:
psql (13.2)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

but that is now what I got I just get the following when I log in with password :

root@mypostgres-1-0:/# psql -U postgres
Password for user postgres: 
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.
rahul9595 commented 2 years ago

@movitto Any luck with this problem? Also regarding your original post are you actually able to connect using SSL because your logs say otherwise. If you are still running into that problem I was able to resolve it by adding host ssl line for 127.0.0.1/32 assuming your testing in localhost. I still am not able to get SSL to work since I am running into a permission error now where the secret certificate mount in mypostgres/.postgresql is being set to 0440 even though I am specifically setting the default mode to 0400 and read only to true, so I am getting a

private key file "..data/postgres.config.base.sq-key.pem" has group or world access; permissions should be u=rw (0600) or less

@alex-arica help would be really appreciated!

movitto commented 2 years ago

Yes SSL works. I know this because when non-ssl connections are disabled in pg_hba.conf, I can connect using the cert/key/ca credentials.

The issue at hand is something is trying to repeatedly connect to the database, and it's not apparent where those requests are originating from. It most likely relates to something within the kubegres controller as this issue starts appearing when only the database is deployed, before we spin up any of our application's containers. The issue manifests when ssl is enabled & required with the repeating error:

2022-01-10 01:34:19.615 GMT [420] FATAL:  connection requires a valid client certificate
2022-01-10 01:34:19.617 GMT [421] FATAL:  no pg_hba.conf entry for host "<IP>", user "postgres", database "postgres", no encryption

Which is understandable since that component would not have access to key and signed certificate, just need to figure out which component it is!

And when non-ssl connections are enabled the issue manifests itself with the repeating error:

2022-01-11 15:17:33.104 GMT [152387] LOG:  unexpected EOF on client connection with an open transaction
2022-01-11 15:17:33.108 GMT [152410] LOG:  could not receive data from client: Connection reset by peer
movitto commented 2 years ago

As far as your issue @rahul9595, the original inquiries were not kubegres specific, and best asked on a general postgresql support forum such as stack overflow. As far as permissions for mounting the credentials via a secret / kubernetes volume, see this comment for the configuration that worked for me. Hope that helps.

rahul9595 commented 2 years ago

hi @movitto I have the exact same config as you :

kind: Kubegres
metadata:
  name: mypostgres
  namespace: default
spec:
   replicas: 3
   image: postgres:14.1
   database:
      size: 10Gi
   customConfig: mypostgres-conf
   securityContext:
     fsGroup: 999
   volume:
     volumeMounts:
       - mountPath: "/mypostgres/.postgresql"
         name: mypostgres-server-tls
         readOnly: true
     volumes:
       - name: mypostgres-server-tls
         secret:
           secretName: mypostgres-server-tls
           defaultMode: 0400
   env:
     - name: POSTGRES_PASSWORD
       valueFrom:
         secretKeyRef:
           name: mypostgres-secret
           key: superUserPassword
     - name: POSTGRES_REPLICATION_PASSWORD
       valueFrom:
         secretKeyRef:
           name: mypostgres-secret
           key: replicationUserPassword

yet I get the permission error. I am curious, when you use the psql command do you mind posting the screenshot of the output?

anthonator commented 2 years ago

@movitto If the errors you’re seeing happen on a regular interval I’m wondering if it’s Kubegres doing a health check. Just a guess.

movitto commented 2 years ago

@anthonator yes I suspect that is the case as well, but we haven't been able to track down exactly what is invoking the request. Will continue investigating when we get the chance (if you are able to find it, please let us know here... thanks!)