Open movitto opened 2 years ago
Just tried debugging for a few hours more before having to move to another priority task. I disabled as much as I can inside postgres (autovacuming, stats, etc) and set 'replicas' to 1 so that only the primary server would be deployed. At the end of the day 'ps aux' turns up the following:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 1 0.1 1.3 213268 28164 ? Ss 02:32 0:00 postgres -c config_file=/etc/postgres.conf -c hba_file=/etc/pg_hba.conf
postgres 67 0.0 0.1 213268 4060 ? Ss 02:32 0:00 postgres: checkpointer
postgres 68 0.0 0.2 213412 5812 ? Ss 02:32 0:00 postgres: background writer
postgres 69 0.0 0.4 213268 10032 ? Ss 02:32 0:00 postgres: walwriter
postgres 70 0.0 0.2 67980 5016 ? Ss 02:32 0:00 postgres: stats collector
postgres 71 0.0 0.3 213816 6784 ? Ss 02:32 0:00 postgres: logical replication launcher
root 88 0.0 0.0 2420 520 pts/0 Ss 02:32 0:00 sh
root 568 0.0 0.1 9700 3120 pts/0 R+ 02:33 0:00 ps aux
Yet the error still appears in the logs. I can't imagine there is anything else that can be disabled on postgres side (and there are no other process running locally) so I'm guessing kubegres is trying to connect to the serverfrom another container in the pod.
I could be mistaken but as mentioned I can't dedicate any more time to this. Any insight would be highly appreciated!
Hi @movitto and @alex-arica I am trying to setup SSL authentication with kubegress and I have followed the steps provided in #81 but unfortunately I am not able authenticate using SSL. Here are the files I am using :
postgres-conf.yaml :
apiVersion: v1
kind: ConfigMap
metadata:
name: mypostgres-conf
namespace: default
data:
primary_init_script.sh: |
⦙ #!/bin/bash
⦙ set -e
⦙ # This script assumes that the env-var $POSTGRES_MY_DB_PASSWORD contains the password of the custom user to create.
⦙ # You can add any env-var in your Kubegres resource config YAML.
⦙ dt=$(date '+%d/%m/%Y %H:%M:%S');
⦙ echo "$dt - Running init script the 1st time Primary PostgreSql container is created...";
⦙ customDatabaseName="keycloak"
⦙ customUserName="wsi"
⦙ echo "$dt - Running: psql -v ON_ERROR_STOP=1 --username $POSTGRES_USER --dbname $POSTGRES_DB ...";
⦙ psql -v ON_ERROR_STOP=1 -h /var/run/postgresql --username "postgres" --dbname "postgres" <<-EOSQL
⦙ CREATE DATABASE $customDatabaseName;
⦙ CREATE USER $customUserName WITH PASSWORD 'wsi';
⦙ GRANT ALL PRIVILEGES ON DATABASE "$customDatabaseName" to $customUserName;
⦙ EOSQL
⦙ echo "$dt - Init script is completed";
postgres.conf: |
⦙ # SSL
⦙ ssl = on
⦙ ssl_cert_file = '/mypostgres/.postgresql/postgres.config.base.sq.pem'
⦙ ssl_key_file = '/mypostgres/.postgresql/postgres.config.base.sq-key.pem'
⦙ ssl_ca_file = '/mypostgres/.postgresql/sq-root-ca-bundle.pem'
··
pg_hba.conf: |
⦙ local all all md5
⦙ host replication replication all md5
⦙ hostssl all all all md5 clientcert=verify-ca
⦙ hostssl keycloak wsi all scram-sha-256
⦙ hostssl all all 0.0.0.0/0 reject
postgres.yaml
apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
name: mypostgres
namespace: default
spec:
⦙replicas: 1
⦙image: postgres:14.1
··⦙
⦙database:
⦙ ⦙ size: 200Mi
··
⦙customConfig: mypostgres-conf····
··
⦙securityContext:
⦙ ⦙ fsGroup: 999
⦙volume:
⦙ ⦙ volumeMounts:
⦙ ⦙ ⦙ - mountPath: "/mypostgres/.postgresql"
⦙ ⦙ ⦙ ⦙ name: postgresql-ssl-secret
⦙ ⦙ ⦙ ⦙ readOnly: true
⦙ ⦙ volumes:
⦙ ⦙ ⦙ - name: postgresql-ssl-secret
⦙ ⦙ ⦙ ⦙ secret:
⦙ ⦙ ⦙ ⦙ ⦙ secretName: postgresql-ssl-secret
⦙ ⦙ ⦙ ⦙ ⦙ defaultMode: 0400
⦙env:
⦙ ⦙ - name: POSTGRES_PASSWORD
⦙ ⦙ ⦙ valueFrom:
⦙ ⦙ ⦙ ⦙ secretKeyRef:
⦙ ⦙ ⦙ ⦙ ⦙ ⦙ name: mypostgres-secret
⦙ ⦙ ⦙ ⦙ ⦙ ⦙ key: superUserPassword
⦙ ⦙ - name: POSTGRES_REPLICATION_PASSWORD
⦙ ⦙ ⦙ valueFrom:
⦙ ⦙ ⦙ ⦙ ⦙secretKeyRef:
⦙ ⦙ ⦙ ⦙ ⦙ ⦙ name: mypostgres-secret
⦙ ⦙ ⦙ ⦙ ⦙ ⦙ key: replicationUserPassword
⦙ ⦙ - name: POSTGRES_MY_DB_PASSWORD
⦙ ⦙ ⦙ valueFrom:
⦙ ⦙ ⦙ ⦙ ⦙secretKeyRef:
⦙ ⦙ ⦙ ⦙ ⦙ ⦙ name: mypostgres-secret
⦙ ⦙ ⦙ ⦙ ⦙ ⦙ key: myDbUserPassword
SSL is being turned on yet when I run 'psql -U postgres' inside the postgres container it prompts me for a password rather than authenticating it with the client certificate. Could I please get some help on this. Thank you.
@rahul9595 SSL encryption is used to secure the communication channel. Password authentication is still enforced regardless of what communication channel you use. If your goal is to disable the password prompt consider creating a pgpass password file
Edit: Also you can set 'trust' for certain access points in pg_hba.conf. Be careful with this though, you don't want to trust all connections coming in over the internet for example
@alex-arica regarding the original question, I'm seeing a continuous stream of "could not receive data from client: Connection reset by peer" and every so often "unexpected EOF on client connection with an open transaction" under the "normal" edge case where non-ssl connections are permitted. Could you provide any insights as to any kubegres processes that try to establish a connection with the primary postgres server? Thank you.
2022-01-11 15:02:31.625 GMT [151083] LOG: could not receive data from client: Connection reset by peer
2022-01-11 15:15:58.322 GMT [152268] LOG: could not receive data from client: Connection reset by peer
2022-01-11 15:15:58.326 GMT [152266] LOG: could not receive data from client: Connection reset by peer
2022-01-11 15:15:58.326 GMT [152266] LOG: unexpected EOF on client connection with an open transaction
2022-01-11 15:17:33.104 GMT [152387] LOG: could not receive data from client: Connection reset by peer
2022-01-11 15:17:33.104 GMT [152387] LOG: unexpected EOF on client connection with an open transaction
2022-01-11 15:17:33.108 GMT [152410] LOG: could not receive data from client: Connection reset by peer
...
Hi @movitto @alex-arica thanks for your quick response I understand that password will be enforced unless the password prompt is disabled but there is no indication that SSL is being used at all. Is there a way to test SSL connection inside the postgreSQL container ? I tried the following
psql -p 5432 -h localhost
but I get the following error
connection to server at "localhost" (::1), port 5432 failed: FATAL: no pg_hba.conf entry for host "::1", user "root", database "root", no encryption```
Even if you log in with the password and SSL is enabled there should be an output of something of this regard :
[postgres@node1 ~]$ psql test -h 10.0.3.200
Password for user postgres:
psql (13.2)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
but that is now what I got I just get the following when I log in with password :
root@mypostgres-1-0:/# psql -U postgres
Password for user postgres:
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.
@movitto Any luck with this problem? Also regarding your original post are you actually able to connect using SSL because your logs say otherwise. If you are still running into that problem I was able to resolve it by adding host ssl line for 127.0.0.1/32 assuming your testing in localhost. I still am not able to get SSL to work since I am running into a permission error now where the secret certificate mount in mypostgres/.postgresql is being set to 0440 even though I am specifically setting the default mode to 0400 and read only to true, so I am getting a
private key file "..data/postgres.config.base.sq-key.pem" has group or world access; permissions should be u=rw (0600) or less
@alex-arica help would be really appreciated!
Yes SSL works. I know this because when non-ssl connections are disabled in pg_hba.conf, I can connect using the cert/key/ca credentials.
The issue at hand is something is trying to repeatedly connect to the database, and it's not apparent where those requests are originating from. It most likely relates to something within the kubegres controller as this issue starts appearing when only the database is deployed, before we spin up any of our application's containers. The issue manifests when ssl is enabled & required with the repeating error:
2022-01-10 01:34:19.615 GMT [420] FATAL: connection requires a valid client certificate
2022-01-10 01:34:19.617 GMT [421] FATAL: no pg_hba.conf entry for host "<IP>", user "postgres", database "postgres", no encryption
Which is understandable since that component would not have access to key and signed certificate, just need to figure out which component it is!
And when non-ssl connections are enabled the issue manifests itself with the repeating error:
2022-01-11 15:17:33.104 GMT [152387] LOG: unexpected EOF on client connection with an open transaction
2022-01-11 15:17:33.108 GMT [152410] LOG: could not receive data from client: Connection reset by peer
As far as your issue @rahul9595, the original inquiries were not kubegres specific, and best asked on a general postgresql support forum such as stack overflow. As far as permissions for mounting the credentials via a secret / kubernetes volume, see this comment for the configuration that worked for me. Hope that helps.
hi @movitto I have the exact same config as you :
kind: Kubegres
metadata:
name: mypostgres
namespace: default
spec:
replicas: 3
image: postgres:14.1
database:
size: 10Gi
customConfig: mypostgres-conf
securityContext:
fsGroup: 999
volume:
volumeMounts:
- mountPath: "/mypostgres/.postgresql"
name: mypostgres-server-tls
readOnly: true
volumes:
- name: mypostgres-server-tls
secret:
secretName: mypostgres-server-tls
defaultMode: 0400
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: mypostgres-secret
key: superUserPassword
- name: POSTGRES_REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: mypostgres-secret
key: replicationUserPassword
yet I get the permission error. I am curious, when you use the psql command do you mind posting the screenshot of the output?
@movitto If the errors you’re seeing happen on a regular interval I’m wondering if it’s Kubegres doing a health check. Just a guess.
@anthonator yes I suspect that is the case as well, but we haven't been able to track down exactly what is invoking the request. Will continue investigating when we get the chance (if you are able to find it, please let us know here... thanks!)
Hello again! This issue relates to #81 where I detail setup instructions for enabling ssl-only communication to the postgresql server. Where the steps there are executed, everything works correctly and clients can successfully connect via ssl but the following error starts to appear in the logs for all pods (primary and replicas):
The IP being that of the pod which postgres is running (and that which owns the corresponding log file). This occurs even if no external clients connect to the postgres database so I'm thinking that there is some internal process regularily trying to connect to postgres on the local pod, using the IP as the host, and is failing as non-ssl connections are disable.
After spending several hours on this I could not track the issue down.
Under "normal" circumstances, when non-ssl connections are permitted, we see the following in the logs instead of the above error:
It's frustrating as we now need to re-enable non-ssl connections as otherwise the logs will get polluted and unweildy. Could you please advise? Thank you.