zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.3k stars 974 forks source link

PostgreSQL fails to start on Apple M1 (ARM64) #1906

Closed mprimeaux closed 2 years ago

mprimeaux commented 2 years ago

Please, answer some short questions which should help us to understand your problem / question better?

Some general remarks when posting a bug report:

Thanks for your help in advance. I'm not sure where to start so thought I'd start with this project. In a nutshell, the PostgreSQL operator and related spilo works as expected on intel (amd64) hosts but not on Apple Silicon (arm64). I've tried this out on both my Apple MacBook Pro and Apple Studio.

Here is the log, which I think may point to spilo but also eludes to the the PostgreSQL socket not starting.


postgres 2022-05-29 01:40:10,043 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
postgres 2022-05-29 01:40:12,070 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
postgres 2022-05-29 01:40:12,088 - bootstrapping - INFO - No meta-data available for this provider
postgres 2022-05-29 01:40:12,093 - bootstrapping - INFO - Looks like your running local
postgres 2022-05-29 01:40:12,242 - bootstrapping - INFO - Configuring crontab
postgres 2022-05-29 01:40:12,243 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
postgres 2022-05-29 01:40:12,243 - bootstrapping - INFO - Configuring pam-oauth2
postgres 2022-05-29 01:40:12,244 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
postgres 2022-05-29 01:40:12,244 - bootstrapping - INFO - Configuring standby-cluster
postgres 2022-05-29 01:40:12,244 - bootstrapping - INFO - Configuring log
postgres 2022-05-29 01:40:12,245 - bootstrapping - INFO - Configuring wal-e
postgres 2022-05-29 01:40:12,245 - bootstrapping - INFO - Configuring patroni
postgres 2022-05-29 01:40:12,285 - bootstrapping - INFO - Writing to file /run/postgres.yml
postgres 2022-05-29 01:40:12,288 - bootstrapping - INFO - Configuring pgqd
postgres 2022-05-29 01:40:12,288 - bootstrapping - INFO - Configuring pgbouncer
postgres 2022-05-29 01:40:12,289 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
postgres 2022-05-29 01:40:12,289 - bootstrapping - INFO - Configuring bootstrap
postgres 2022-05-29 01:40:12,289 - bootstrapping - INFO - Configuring certificate
postgres 2022-05-29 01:40:12,289 - bootstrapping - INFO - Generating ssl self-signed certificate
postgres 2022-05-29 01:40:14,627 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
postgres 2022-05-29 01:40:14,675 INFO: No PostgreSQL configuration items changed, nothing to reload.
postgres 2022-05-29 01:40:14,689 INFO: Lock owner: None; I am gs-core-1-0
postgres 2022-05-29 01:40:14,770 INFO: trying to bootstrap a new cluster
postgres The files belonging to this database system will be owned by user "postgres".
postgres This user must also own the server process.
postgres The database cluster will be initialized with locale "en_US.utf-8".
postgres The default database encoding has accordingly been set to "UTF8".
postgres The default text search configuration will be set to "english".
postgres Data page checksums are disabled.
postgres fixing permissions on existing directory /home/postgres/pgdata/pgroot/data ... ok
postgres creating subdirectories ... ok
postgres selecting dynamic shared memory implementation ... posix
postgres selecting default max_connections ... 100
postgres selecting default shared_buffers ... 128MB
postgres selecting default time zone ... Etc/UTC
postgres creating configuration files ... ok
postgres running bootstrap script ... ok
postgres performing post-bootstrap initialization ... ok
postgres syncing data to disk ... ok
postgres Success. You can now start the database server using:
postgres     /usr/lib/postgresql/14/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data -l logfile start
postgres 2022-05-29 01:40:19,483 INFO: postmaster pid=223
postgres 2022-05-29 01:40:19 UTC [223[]: [1-1[] 6292cf03.df 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
postgres 2022-05-29 01:40:19 UTC [223[]: [2-1[] 6292cf03.df 0     LOG:  pg_stat_kcache.linux_hz is set to 90909
postgres /var/run/postgresql:5432 - no response
postgres 2022-05-29 01:40:19 UTC [223[]: [3-1[] 6292cf03.df 0     LOG:  redirecting log output to logging collector process
postgres 2022-05-29 01:40:19 UTC [223[]: [4-1[] 6292cf03.df 0     HINT:  Future log output will appear in directory "../pg_log".
postgres /var/run/postgresql:5432 - accepting connections
postgres /var/run/postgresql:5432 - accepting connections
postgres 2022-05-29 01:40:20,840 INFO: establishing a new patroni connection to the postgres cluster
postgres 2022-05-29 01:40:20,932 INFO: running post_bootstrap
postgres psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: Connection refused
postgres     Is the server running locally and accepting connections on that socket?
postgres psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: Connection refused
postgres     Is the server running locally and accepting connections on that socket?
postgres 2022-05-29 01:40:22,287 ERROR: post_init script /scripts/post_init.sh "zalandos" returned non-zero code 2
postgres 2022-05-29 01:40:22,293 INFO: removing initialize key after failed attempt to bootstrap the cluster
postgres 2022-05-29 01:40:22,362 INFO: renaming data directory to /home/postgres/pgdata/pgroot/data_2022-05-29-01-40-22
postgres Traceback (most recent call last):
postgres   File "/usr/local/bin/patroni", line 11, in <module>
postgres     sys.exit(main())
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/__main__.py", line 143, in main
postgres     return patroni_main()
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/__main__.py", line 135, in patroni_main
postgres     abstract_main(Patroni, schema)
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/daemon.py", line 100, in abstract_main
postgres     controller.run()
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/__main__.py", line 105, in run
postgres     super(Patroni, self).run()
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/daemon.py", line 59, in run
postgres     self._run_cycle()
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/__main__.py", line 108, in _run_cycle
postgres     logger.info(self.ha.run_cycle())
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1503, in run_cycle
postgres     info = self._run_cycle()
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1377, in _run_cycle
postgres     return self.post_bootstrap()
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1269, in post_bootstrap
postgres     self.cancel_initialization()
postgres   File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1262, in cancel_initialization
postgres     raise PatroniFatalException('Failed to bootstrap cluster')
postgres patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
postgres /etc/runit/runsvdir/default/patroni: finished with code=1 signal=0
postgres /etc/runit/runsvdir/default/patroni: sleeping 30 seconds

Your help and insight is greatly appreciated. I'm happy to help as desired.

FxKu commented 2 years ago

Does it only work with an arm-compatible spilo image @CyberDem0n? @jopadi can test.

mprimeaux commented 2 years ago

@CyberDem0n and @jopadi if there's any help you need from me with testing, please let me know. Appreciate your help and time.

CyberDem0n commented 2 years ago

@mprimeaux according to logs Patroni successfully run initdb and started Postgres. After that Postgres crashed while Patroni was running post_init.sh.

You need to check postgres logs in the ~postgres/pgdata/pgroot/pg_log directory.

mprimeaux commented 2 years ago

@CyberDem0n I will check that tonight or first thing in the morning and get back to you.

mprimeaux commented 2 years ago

@CyberDem0n Here are the various logs and ls from the ~postgres/pgdata/pgroot/pg_log folder.

k8s-log.txt ls.txt postgresql-5.csv postgresql-5.log

Of the set, the one entry that stands out from the postgres-5.csv file is the following:

2022-06-17 23:21:41.416 UTC,"postgres","postgres",714,"[local]",62ad0c85.2ca,2,"authentication",2022-06-17 23:21:41 UTC,5/6,0,LOG,00000,"connection authorized: user=postgres database=postgres application_name=Patroni",,,,,,,,,"","client backend",,0
2022-06-17 23:21:41.874 UTC,,,698,,62ad0c84.2ba,2,,2022-06-17 23:21:40 UTC,4/0,0,FATAL,57P01,"postmaster exited while TimescaleDB background worker launcher was working",,,,,,,,,"","TimescaleDB Background Worker Launcher",,0

Any thoughts or guidance? For context, no errors are experienced on and64 CPUs. Not sure if there are arm64 builds available for the related container images.

mprimeaux commented 2 years ago

@CyberDem0n it appears the latest operator release 1.8.2 addressed this issue, which might be due to the latest spilo version being used.