Closed qurname2 closed 4 years ago
It's fine to either use ConfigMap
or OperatorConfiguration
. Using both makes it harder to tell which setting is coming from where. In the cluster manifest, don't specify the postgres user as he's already there. Maybe this is causing the trouble. Please, check the logs of operator Pod and database Pods for more error messages.
I deleted using OperatorConfiguration, deleted defenition postgres user from postgresql manifest. Also I thought, that its problem with DNS - postgres-operator didn't correctly resolve DNS name of my postgresql service (acid-minimal-cluster), but no, after exec inside container with an operator I see a correct answer from nslookup.. And didn't see more error messages in operator and db Pods..
Delete also the env variable POSTGRES_OPERATOR_CONFIGURATION_OBJECT
in the deployment and start again from scratch. Are you testing on minikube?
@FxKu, sorry that I did not reply for a while. Yes, I deleted env POSTGRES_OPERATOR_CONFIGURATION_OBJECT
, del postgres-operator and after this getting the same behavior. No, I am testing on bare-metal k8s-cluster - v1.14.3
@qurname2 can you try the latest
Postgres Operator version (not 1.2.0)? Was also thinking that your cluster domain is different to cluster.local
, but you said that's not the case. So Patroni (db Pods) logs look fine? And there is not other warning or error in the operator logs?
After upgrading to latest tag:
2019/11/08 09:43:25 Fully qualified configmap name: postgres-operator/postgres-operator
2019/11/08 09:43:25 Spilo operator v1.2.0-23-g33e1d60-dirty
time="2019-11-08T09:43:25Z" level=warning msg="in the operator config map, the pod service account name zalando-postgres-operator does not match the name operator given in the account definition; using the former for consistency" pkg=controller
time="2019-11-08T09:43:25Z" level=info msg="Parse role bindings" pkg=controller
time="2019-11-08T09:43:25Z" level=info msg="successfully parsed" pkg=controller
time="2019-11-08T09:43:25Z" level=info msg="Listening to all namespaces" pkg=controller
time="2019-11-08T09:43:25Z" level=info msg="customResourceDefinition \"postgresqls.acid.zalan.do\" is already registered and will only be updated" pkg=controller
time="2019-11-08T09:43:29Z" level=warning msg="in the operator config map, the pod service account name zalando-postgres-operator does not match the name operator given in the account definition; using the former for consistency" pkg=controller
time="2019-11-08T09:43:29Z" level=info msg="config: {\n\t\"ReadyWaitInterval\": 3000000000,\n\t\"ReadyWaitTimeout\" .... ..... }
time="2019-11-08T09:43:29Z" level=debug msg="acquiring initial list of clusters" pkg=controller
time="2019-11-08T09:43:29Z" level=debug msg="added new cluster: \"my-ns/my-app-db\"" pkg=controller
time="2019-11-08T09:43:29Z" level=info msg="\"SYNC\" event has been queued" cluster-name=my-ns/my-app-db pkg=controller worker=0
time="2019-11-08T09:43:29Z" level=info msg="there are 1 clusters running" pkg=controller
time="2019-11-08T09:43:29Z" level=info msg="started working in background" pkg=controller
time="2019-11-08T09:43:29Z" level=info msg="listening on :8080" pkg=apiserver
time="2019-11-08T09:43:29Z" level=info msg="\"ADD\" event has been queued" cluster-name=my-ns/my-app-db pkg=controller worker=0
time="2019-11-08T09:43:29Z" level=info msg="syncing of the cluster started" cluster-name=my-ns/my-app-db pkg=controller worker=0
time="2019-11-08T09:43:29Z" level=warning msg="could not get oauth token to authenticate to team service API, returning empty list of team members: could not get credentials secret: secrets \"postgresql-operator\" not found" cluster-name=my-ns/my-app-db pkg=cluster
time="2019-11-08T09:43:29Z" level=debug msg="syncing secrets" cluster-name=my-ns/my-app-db pkg=cluster
time="2019-11-08T09:43:29Z" level=debug msg="new node has been added: \"/my-k8s-worker-node\" ()" pkg=controller
.... ....
time="2019-11-08T09:45:16Z" level=error msg="could not connect to PostgreSQL database: dial tcp 127.0.0.1:5432: connect: connection refused" cluster-name=my-ns/my-app-db pkg=cluster
time="2019-11-08T09:45:16Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=my-ns/my-app-db pkg=cluster
time="2019-11-08T09:45:16Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=my-ns/my-app-db pkg=controller worker=0
time="2019-11-08T09:45:16Z" level=debug msg="cluster already exists" cluster-name=my-ns/my-app-db pkg=controller worker=0
I also thought about cluster.local
, but yes, that's not the case.
root@my-app-db-0:/home/postgres# cat /etc/resolv.conf
nameserver 10.233.0.3
search my-ns.svc.cluster.local svc.cluster.local cluster.local mycompany.org
options ndots:5
Also, after upgrading tag to latest I tried recreate Kind: postgresql and in db logs didn't see smth interesting..
decompressing spilo image...
2019-11-08 10:02:20,847 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2019-11-08 10:02:22,852 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2019-11-08 10:02:22,853 - bootstrapping - INFO - No meta-data available for this provider
2019-11-08 10:02:22,853 - bootstrapping - INFO - Looks like your running local
2019-11-08 10:02:22,869 - bootstrapping - WARNING - could not parse kubernetes labels as a JSON: Expecting value: line 1 column 1 (char 0), reverting to the default: {"application": "spilo"}
2019-11-08 10:02:22,883 - bootstrapping - INFO - Configuring standby-cluster
2019-11-08 10:02:22,883 - bootstrapping - INFO - Configuring pam-oauth2
2019-11-08 10:02:22,884 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2019-11-08 10:02:22,884 - bootstrapping - INFO - Configuring patronictl
2019-11-08 10:02:22,884 - bootstrapping - INFO - Configuring patroni
2019-11-08 10:02:22,893 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2019-11-08 10:02:22,893 - bootstrapping - INFO - Configuring bootstrap
2019-11-08 10:02:22,893 - bootstrapping - INFO - Configuring wal-e
2019-11-08 10:02:22,893 - bootstrapping - INFO - Configuring log
2019-11-08 10:02:22,893 - bootstrapping - INFO - Configuring certificate
2019-11-08 10:02:22,893 - bootstrapping - INFO - Generating ssl certificate
2019-11-08 10:02:22,944 - bootstrapping - INFO - Configuring crontab
2019-11-08 10:02:22,972 - bootstrapping - INFO - Configuring renice
2019-11-08 10:02:22,975 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of permissions
2019-11-08 10:02:22,976 - bootstrapping - INFO - Configuring pgbouncer
2019-11-08 10:02:22,976 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2019-11-08 10:02:23,234 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
2019-11-08 10:02:23,234 INFO Included extra file "/etc/supervisor/conf.d/cron.conf" during parsing
2019-11-08 10:02:23,234 INFO Included extra file "/etc/supervisor/conf.d/patroni.conf" during parsing
2019-11-08 10:02:23,234 INFO Included extra file "/etc/supervisor/conf.d/pgq.conf" during parsing
2019-11-08 10:02:23,242 INFO RPC interface 'supervisor' initialized
2019-11-08 10:02:23,242 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2019-11-08 10:02:23,242 INFO supervisord started with pid 1
2019-11-08 10:02:24,246 INFO spawned: 'cron' with pid 36
2019-11-08 10:02:24,248 INFO spawned: 'patroni' with pid 37
2019-11-08 10:02:24,250 INFO spawned: 'pgq' with pid 38
2019-11-08 10:02:24,667 INFO: No PostgreSQL configuration items changed, nothing to reload.
2019-11-08 10:02:24,685 INFO: Lock owner: None; I am my-app-db-0
2019-11-08 10:02:24,710 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".
Data page checksums are enabled.
creating directory /home/postgres/pgdata/pgroot/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... 2019-11-08 10:02:25,480 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-11-08 10:02:25,480 INFO success: patroni entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-11-08 10:02:25,480 INFO success: pgq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
ok
Success. You can now start the database server using:
/usr/lib/postgresql/11/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data -l logfile start
2019-11-08 10:02:25,921 INFO: postmaster pid=67
/var/run/postgresql:5432 - no response
2019-11-08 10:02:25 UTC [67]: [1-1] 5dc53d31.43 0 LOG: Auto detecting pg_stat_kcache.linux_hz parameter...
2019-11-08 10:02:25 UTC [67]: [2-1] 5dc53d31.43 0 LOG: pg_stat_kcache.linux_hz is set to 250
2019-11-08 10:02:25 UTC [67]: [3-1] 5dc53d31.43 0 LOG: listening on IPv4 address "0.0.0.0", port 5432
2019-11-08 10:02:26 UTC [67]: [4-1] 5dc53d31.43 0 LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-11-08 10:02:26 UTC [67]: [5-1] 5dc53d31.43 0 LOG: redirecting log output to logging collector process
2019-11-08 10:02:26 UTC [67]: [6-1] 5dc53d31.43 0 HINT: Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - accepting connections
/var/run/postgresql:5432 - accepting connections
2019-11-08 10:02:27,022 INFO: establishing a new patroni connection to the postgres cluster
2019-11-08 10:02:27,039 INFO: running post_bootstrap
SET
DO
DO
DO
CREATE EXTENSION
NOTICE: version "1.0" of extension "pg_auth_mon" is already installed
ALTER EXTENSION
GRANT
CREATE EXTENSION
NOTICE: version "1.1" of extension "pg_cron" is already installed
ALTER EXTENSION
ALTER POLICY
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
REVOKE
GRANT
REVOKE
GRANT
GRANT
CREATE EXTENSION
DO
CREATE TABLE
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
RESET
SET
NOTICE: schema "zmon_utils" does not exist, skipping
DROP SCHEMA
DO
NOTICE: language "plpythonu" does not exist, skipping
DROP LANGUAGE
NOTICE: function plpython_call_handler() does not exist, skipping
DROP FUNCTION
NOTICE: function plpython_inline_handler(internal) does not exist, skipping
DROP FUNCTION
NOTICE: function plpython_validator(oid) does not exist, skipping
DROP FUNCTION
CREATE SCHEMA
GRANT
SET
CREATE TYPE
CREATE FUNCTION
CREATE FUNCTION
GRANT
You are now connected to database "postgres" as user "postgres".
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
NOTICE: version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
You are now connected to database "template1" as user "postgres".
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
NOTICE: version "1.6" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
CREATE SCHEMA
GRANT
GRANT
SET
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
CREATE FUNCTION
REVOKE
GRANT
GRANT
CREATE VIEW
REVOKE
GRANT
GRANT
RESET
2019-11-08 10:02:27,765 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2019-11-08 10:02:27,830 INFO: initialized a new cluster
2019-11-08 10:02:37,758 INFO: Lock owner: my-app-db-0; I am my-app-db-0
2019-11-08 10:02:37,769 INFO: Lock owner: my-app-db-0; I am my-app-db-0
2019-11-08 10:02:37,803 INFO: no action. i am the leader with the lock
2019-11-08 10:02:47,759 INFO: Lock owner: my-app-db-0; I am my-app-db-0
2019-11-08 10:02:47,776 INFO: no action. i am the leader with the lock
2019-11-08 10:02:57,758 INFO: Lock owner: my-app-db-0; I am my-app-db-0
@FxKu hello.
We had run in the same problems.
I tried two version of operator - 1.2.0
and latest
With latest
version, after cluster was created, I do not see any error in operator logs.
but with 1.2.0
- errors are present:
time="2019-11-08T13:16:50Z" level=error msg="could not connect to PostgreSQL database: dial tcp 10.233.53.184:5432: connect: connection refused" cluster-name=default/grafana-postgres pkg=cluster
But I had to say that both operator have the same problem - they created not worked services - they do not point on pods.
Look on endpoints.
$ kubectl get endpoints
grafana-postgres <none> 10m
grafana-postgres-repl <none> 10m
testgrafana-postgres <none> 57s
testgrafana-postgres-repl <none> 57s
testgrafana-postgres
- cluster created with 1.2.0 version.
Selectors are not present in service
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2019-11-08T13:16:43Z"
labels:
application: spilo
cluster-name: testgrafana-postgres
spilo-role: master
team: testgrafana
name: testgrafana-postgres
namespace: default
resourceVersion: "52400"
selfLink: /api/v1/namespaces/default/services/testgrafana-postgres
uid: 01e1e125-022a-11ea-ba71-9600002e4379
spec:
clusterIP: 10.233.48.124
ports:
- name: postgresql
port: 5432
protocol: TCP
targetPort: 5432
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
After i added Selectors by hand, Endpoint show IPs and Ports of pods.(both of pods, master and replica, i guess)
@DeamonMV, about adding selector by hand, I came across an interesting and not quite logical, in my opinion, operator behavior - if you add a selector by your hands or change the type of service to a NodePort (for example, for debugging) - the operator after couple of minutes will return type to ClusterIP(LoadBalancer) or will delete adding by your selectors. Did you also have smth like this or only I have such "unique" problems?)
@qurname2 i tried to change tupe of the Service to NodePort, but operator did't change anything back.
And i found where was my problem.
If in this manifest remove this section...
version: "11"
...cluster will be created, but cluster will be not worked, and you will see those messages - could not connect to PostgreSQL database
Proper minimal manifest is:
apiVersion: "acid.zalan.do/v1"
kind: postgresql
namespace: "default"
metadata:
name: "acid-psql"
spec:
postgresql:
version: "11"
teamId: "acid"
volume:
size: 1Gi
storageClass: postgres
numberOfInstances: 2
users:
postgres:
- superuser
- createdb
primaryuser:
- createdb
databases:
userdb: primaryuser
Found out that changing the service type has some issues. So some of your problems could be resolved with #716. For for errors on cluster creation, it must be something different.
We merged #716, so error messages on changing the service type should not appear anymore. Could you test again @qurname2, @DeamonMV ?
Will close this one for now as it was reported that things work with the version from master. Reopen if still find the same issues.
@FxKu Hi Felix! Looks like I may have similar problem... I wonder if you have time to take a look into my case. I have a bare metal k8s installation, postgres-operator v1.6.1 in it. I'm trying to deploy this manifest:
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: devops-test
namespace: postgres-operator
spec:
teamId: "acid"
volume:
size: 1Gi
numberOfInstances: 1
users:
zalando: # database owner
- superuser
- createdb
databases:
postgres: postgres # dbname: owner
postgresql:
version: "12"
parameters:
shared_buffers: "64MB"
max_connections: "50"
log_statement: "all"
log_directory: "/var/log/postgresql"
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 250m
memory: 768Mi
patroni:
initdb:
encoding: "UTF8"
locale: "en_US.UTF-8"
data-checksums: "true"
pg_hba:
- local all all trust
- host replication standby all md5
- host all all all md5
allowedSourceRanges:
- 0.0.0.0/0
But I've got connection refused problem:
could not connect to Postgres database: dial tcp 127.0.0.1:5432: connect: connection refused
I wonder why operator tries to connect using localhost? Because I've repeated installation steps with the same configs on GKE, and it works ok... Thank you in advance for your help.
Hi guys! I try to create postgresql cluster, but in postgres-operatot logs I see this: could not connect to PostgreSQL database: dial tcp 127.0.0.1:5432: connect: connection refused" and therefore users and db's from my config weren't created. But pods with postgresql created and patronictl said me, that all is good - one of my postgresql is a leader and psql command is work..
I used this config for postgres-operator:
This config for postgresql-operator-default-configuration:
This configmap:
And this postgresql crd:
Any ideas, what I'm doing wrong?