hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.1k stars 116 forks source link

Install Failure -- Ubuntu 20.04 + pg_auto_failover 1.4 #545

Closed kevinelliott closed 3 years ago

kevinelliott commented 3 years ago

I'm not sure what's up, but on brand new Ubuntu 20.04 installations the install does not seem to work.

Installation of packages goes fine:

kelliott@af-db-controller:~$ sudo apt-get install postgresql-11-auto-failover
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'postgresql-11-auto-failover-1.4' instead of 'postgresql-11-auto-failover'
The following additional packages will be installed:
  pg-auto-failover-cli-1.4 postgresql-11
The following NEW packages will be installed:
  pg-auto-failover-cli-1.4 postgresql-11 postgresql-11-auto-failover-1.4
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/14.8 MB of archives.
After this operation, 48.3 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Preconfiguring packages ...
Selecting previously unselected package pg-auto-failover-cli-1.4.
(Reading database ... 108711 files and directories currently installed.)
Preparing to unpack .../pg-auto-failover-cli-1.4_1.4.1-1_amd64.deb ...
Unpacking pg-auto-failover-cli-1.4 (1.4.1-1) ...
Selecting previously unselected package postgresql-11.
Preparing to unpack .../postgresql-11_11.10-1.pgdg20.04+1_amd64.deb ...
Unpacking postgresql-11 (11.10-1.pgdg20.04+1) ...
Selecting previously unselected package postgresql-11-auto-failover-1.4.
Preparing to unpack .../postgresql-11-auto-failover-1.4_1.4.1-1_amd64.deb ...
Unpacking postgresql-11-auto-failover-1.4 (1.4.1-1) ...
Setting up postgresql-11 (11.10-1.pgdg20.04+1) ...
update-alternatives: using /usr/share/postgresql/11/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
Setting up pg-auto-failover-cli-1.4 (1.4.1-1) ...
Setting up postgresql-11-auto-failover-1.4 (1.4.1-1) ...
Processing triggers for postgresql-common (223.pgdg20.04+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Processing triggers for man-db (2.9.1-1) ...

The version installed:

kelliott@af-db-controller:~$ /usr/bin/pg_autoctl --version
pg_autoctl version 1.4.1
pg_autoctl extension version 1.4
compiled with PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
compatible with Postgres 10, 11, 12, and 13

Then on to setting up the monitor:

kelliott@af-db-controller:~$ export PGDATA=./monitor
kelliott@af-db-controller:~$ export PGPORT=5000
kelliott@af-db-controller:~$ pg_autoctl create monitor --ssl-self-signed --hostname 10.90.31.20 --auth trust --run
22:42:46 55434 INFO  Using default --ssl-mode "require"
22:42:46 55434 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
22:42:46 55434 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
22:42:46 55434 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
22:42:46 55434 WARN  Failed to find pg_ctl command in your PATH
22:42:46 55434 INFO  Found more than one pg_config entry in current PATH:
22:42:46 55434 INFO  Found "/usr/bin/pg_config" for pg version 13.1
22:42:46 55434 INFO  Found "/bin/pg_config" for pg version 13.1
22:42:46 55434 INFO  HINT: export PG_CONFIG to a specific pg_config entry

But then the monitor doesn't seem set up:

kelliott@af-db-controller:~$ pg_autoctl show uri --monitor --pgdata ./monitor
22:42:52 55445 ERROR Failed to open file "/home/kelliott/.config/pg_autoctl/home/kelliott/monitor/pg_autoctl.cfg": No such file or directory
22:42:52 55445 FATAL Failed to parse configuration file "/home/kelliott/.config/pg_autoctl/home/kelliott/monitor/pg_autoctl.cfg"
DimCitus commented 3 years ago

Hi @kevinelliott ; it seems that you have more than one Postgres setup available on your machine, and I am not sure why. As I happen to be running some QA testing today on debian VMs, I could have a look by myself on those instances and here is what I am finding:

ha-admin@ha-demo-dim-paris-b:~$ which -a pg_config | xargs ls -l
-rwxr-xr-x 1 root root 1229 Nov  5  2018 /bin/pg_config
-rwxr-xr-x 1 root root 1229 Nov  5  2018 /usr/bin/pg_config

ha-admin@ha-demo-dim-paris-b:~$ dpkg -S bin/pg_config
diversion by postgresql-common from: /usr/bin/pg_config
diversion by postgresql-common to: /usr/bin/pg_config.libpq-dev
diversion by postgresql-common from: /usr/bin/pg_config
diversion by postgresql-common to: /usr/bin/pg_config.libpq-dev
postgresql-common: /usr/bin/pg_config
postgresql-client-12: /usr/lib/postgresql/12/bin/pg_config

ha-admin@ha-demo-dim-paris-b:~$ ls -ld /bin /usr/bin
lrwxrwxrwx 1 root root     7 Oct 23 04:21 /bin -> usr/bin
drwxr-xr-x 2 root root 20480 Dec  7 10:50 /usr/bin

So debian now defaults to using a single place for things and a symlink to support legacy PATH expectations with /bin, which is breaking our assumptions in pg_autoctl.

The practical answer I can give you is that you need to either export a PG_CONFIG entry in your environment, as per the log output HINT you pasted above, or use the --pgctl option to pg_autoctl create monitor --pgctl /usr/lib/postgresql/11/bin/pg_ctl --ssl-self-signed ....

JelteF commented 3 years ago

@kevinelliott Based on your logs there also seems to be a second issue, you install postgresql-11-auto-failover instead of postgresql-13-auto-failover. The first one is for PG11 and the second one for PG13. Based on your logs you already have PG13 installed.

kevinelliott commented 3 years ago

Yes, so first I followed the directions to a T... when they didn't work, I noticed that postgresql-11-auto-failover was installed in the docs, so I attempted to install postgresql-13-auto-failover in case it would improve the situation. It didn't. Then I went and uninstalled both and went back and tried postgresql-11-auto-failover again.

kevinelliott commented 3 years ago

Hi @kevinelliott ; it seems that you have more than one Postgres setup available on your machine, and I am not sure why. As I happen to be running some QA testing today on debian VMs, I could have a look by myself on those instances and here is what I am finding:

ha-admin@ha-demo-dim-paris-b:~$ which -a pg_config | xargs ls -l
-rwxr-xr-x 1 root root 1229 Nov  5  2018 /bin/pg_config
-rwxr-xr-x 1 root root 1229 Nov  5  2018 /usr/bin/pg_config

ha-admin@ha-demo-dim-paris-b:~$ dpkg -S bin/pg_config
diversion by postgresql-common from: /usr/bin/pg_config
diversion by postgresql-common to: /usr/bin/pg_config.libpq-dev
diversion by postgresql-common from: /usr/bin/pg_config
diversion by postgresql-common to: /usr/bin/pg_config.libpq-dev
postgresql-common: /usr/bin/pg_config
postgresql-client-12: /usr/lib/postgresql/12/bin/pg_config

ha-admin@ha-demo-dim-paris-b:~$ ls -ld /bin /usr/bin
lrwxrwxrwx 1 root root     7 Oct 23 04:21 /bin -> usr/bin
drwxr-xr-x 2 root root 20480 Dec  7 10:50 /usr/bin

So debian now defaults to using a single place for things and a symlink to support legacy PATH expectations with /bin, which is breaking our assumptions in pg_autoctl.

The practical answer I can give you is that you need to either export a PG_CONFIG entry in your environment, as per the log output HINT you pasted above, or use the --pgctl option to pg_autoctl create monitor --pgctl /usr/lib/postgresql/11/bin/pg_ctl --ssl-self-signed ....

Thanks @DimCitus I will give that a try. Do you think future support will automatically detect this instead?

kevinelliott commented 3 years ago

Progress, but now there is an issue with the run.

kelliott@af-db-controller:~$ pg_autoctl create monitor --ssl-self-signed --hostname 10.90.31.20 --auth trust --run --pgctl /usr/lib/postgresql/13/bin/pg_ctl
22:59:25 171624 INFO  Using default --ssl-mode "require"
22:59:25 171624 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
22:59:25 171624 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
22:59:25 171624 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
22:59:25 171624 INFO  Initialising a PostgreSQL cluster at "./monitor"
22:59:25 171624 INFO  /usr/lib/postgresql/13/bin/pg_ctl initdb -s -D ./monitor --option '--auth=trust'
22:59:26 171624 INFO   /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /home/kelliott/monitor/server.crt -keyout /home/kelliott/monitor/server.key -subj "/CN=10.90.31.20"
22:59:26 171624 INFO  Started pg_autoctl postgres service with pid 171644
22:59:26 171644 INFO   /usr/bin/pg_autoctl do service postgres --pgdata ./monitor -v
22:59:26 171624 INFO  Started pg_autoctl listener service with pid 171645
22:59:26 171650 INFO   /usr/lib/postgresql/13/bin/postgres -D /home/kelliott/monitor -p 5000 -h *
22:59:36 171645 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
22:59:36 171645 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
22:59:36 171645 ERROR Failed to get Postgres pid, see above for details
22:59:36 171645 ERROR Failed to ensure that Postgres is running in "/home/kelliott/monitor"
22:59:36 171645 ERROR Failed to install pg_auto_failover in the monitor's Postgres database, see above for details
22:59:36 171624 ERROR pg_autoctl service listener exited with exit status 12
22:59:36 171624 INFO  Restarting service listener
22:59:36 171644 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
22:59:36 171644 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
22:59:36 171644 ERROR Failed to get Postgres pid, see above for details
22:59:36 171644 WARN  Postgres logs from "/home/kelliott/monitor/startup.log":
22:59:36 171644 ERROR 2020-12-08 22:59:26.539 UTC [171650] LOG:  redirecting log output to logging collector process
22:59:36 171644 ERROR 2020-12-08 22:59:26.539 UTC [171650] HINT:  Future log output will appear in directory "log".
22:59:36 171644 WARN  Postgres logs from "/home/kelliott/monitor/log/postgresql-2020-12-08_225926.log":
22:59:36 171644 ERROR 2020-12-08 22:59:26.539 UTC [171650] LOG:  starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
22:59:36 171644 ERROR 2020-12-08 22:59:26.539 UTC [171650] LOG:  listening on IPv4 address "0.0.0.0", port 5000
22:59:36 171644 ERROR 2020-12-08 22:59:26.540 UTC [171650] LOG:  listening on IPv6 address "::", port 5000
22:59:36 171644 FATAL 2020-12-08 22:59:26.542 UTC [171650] FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5000.lock": Permission denied
22:59:36 171644 ERROR 2020-12-08 22:59:26.545 UTC [171650] LOG:  database system is shut down
22:59:36 171644 WARN  Failed to start Postgres instance at "/home/kelliott/monitor"
22:59:36 171659 INFO   /usr/lib/postgresql/13/bin/postgres -D /home/kelliott/monitor -p 5000 -h *
22:59:46 171656 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
22:59:46 171656 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
22:59:46 171656 ERROR Failed to get Postgres pid, see above for details
22:59:46 171656 ERROR Failed to ensure that Postgres is running in "/home/kelliott/monitor"
22:59:46 171656 ERROR Failed to install pg_auto_failover in the monitor's Postgres database, see above for details
22:59:46 171644 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
22:59:46 171644 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
22:59:46 171644 ERROR Failed to get Postgres pid, see above for details
22:59:46 171644 WARN  Postgres logs from "/home/kelliott/monitor/startup.log":
22:59:46 171644 ERROR 2020-12-08 22:59:36.363 UTC [171659] LOG:  redirecting log output to logging collector process
22:59:46 171644 ERROR 2020-12-08 22:59:36.363 UTC [171659] HINT:  Future log output will appear in directory "log".
22:59:46 171644 WARN  Postgres logs from "/home/kelliott/monitor/log/postgresql-2020-12-08_225936.log":
22:59:46 171644 ERROR 2020-12-08 22:59:36.364 UTC [171659] LOG:  starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
22:59:46 171644 ERROR 2020-12-08 22:59:36.364 UTC [171659] LOG:  listening on IPv4 address "0.0.0.0", port 5000
22:59:46 171644 ERROR 2020-12-08 22:59:36.364 UTC [171659] LOG:  listening on IPv6 address "::", port 5000
22:59:46 171644 FATAL 2020-12-08 22:59:36.367 UTC [171659] FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5000.lock": Permission denied
22:59:46 171644 ERROR 2020-12-08 22:59:36.370 UTC [171659] LOG:  database system is shut down
22:59:46 171644 WARN  Failed to start Postgres instance at "/home/kelliott/monitor"
22:59:46 171624 ERROR pg_autoctl service listener exited with exit status 12
22:59:46 171624 INFO  Restarting service listener
22:59:46 171668 INFO   /usr/lib/postgresql/13/bin/postgres -D /home/kelliott/monitor -p 5000 -h *
22:59:56 171665 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
22:59:56 171665 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
22:59:56 171665 ERROR Failed to get Postgres pid, see above for details
22:59:56 171665 ERROR Failed to ensure that Postgres is running in "/home/kelliott/monitor"
22:59:56 171665 ERROR Failed to install pg_auto_failover in the monitor's Postgres database, see above for details
22:59:56 171644 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
22:59:56 171644 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
22:59:56 171644 ERROR Failed to get Postgres pid, see above for details
22:59:56 171644 WARN  Postgres logs from "/home/kelliott/monitor/startup.log":
22:59:56 171644 ERROR 2020-12-08 22:59:46.285 UTC [171668] LOG:  redirecting log output to logging collector process
22:59:56 171644 ERROR 2020-12-08 22:59:46.285 UTC [171668] HINT:  Future log output will appear in directory "log".
22:59:56 171644 WARN  Postgres logs from "/home/kelliott/monitor/log/postgresql-2020-12-08_225946.log":
22:59:56 171644 ERROR 2020-12-08 22:59:46.285 UTC [171668] LOG:  starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
22:59:56 171644 ERROR 2020-12-08 22:59:46.286 UTC [171668] LOG:  listening on IPv4 address "0.0.0.0", port 5000
22:59:56 171644 ERROR 2020-12-08 22:59:46.286 UTC [171668] LOG:  listening on IPv6 address "::", port 5000
22:59:56 171644 FATAL 2020-12-08 22:59:46.289 UTC [171668] FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5000.lock": Permission denied
22:59:56 171644 ERROR 2020-12-08 22:59:46.292 UTC [171668] LOG:  database system is shut down
22:59:56 171644 WARN  Failed to start Postgres instance at "/home/kelliott/monitor"
22:59:56 171624 ERROR pg_autoctl service listener exited with exit status 12
22:59:56 171624 INFO  Restarting service listener
22:59:56 171677 INFO   /usr/lib/postgresql/13/bin/postgres -D /home/kelliott/monitor -p 5000 -h *
23:00:06 171674 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
23:00:06 171674 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
23:00:06 171674 ERROR Failed to get Postgres pid, see above for details
23:00:06 171674 ERROR Failed to ensure that Postgres is running in "/home/kelliott/monitor"
23:00:06 171674 ERROR Failed to install pg_auto_failover in the monitor's Postgres database, see above for details
23:00:06 171644 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
23:00:06 171644 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
23:00:06 171644 ERROR Failed to get Postgres pid, see above for details
23:00:06 171644 WARN  Postgres logs from "/home/kelliott/monitor/startup.log":
23:00:06 171644 ERROR 2020-12-08 22:59:56.309 UTC [171677] LOG:  redirecting log output to logging collector process
23:00:06 171644 ERROR 2020-12-08 22:59:56.309 UTC [171677] HINT:  Future log output will appear in directory "log".
23:00:06 171644 WARN  Postgres logs from "/home/kelliott/monitor/log/postgresql-2020-12-08_225956.log":
23:00:06 171644 ERROR 2020-12-08 22:59:56.309 UTC [171677] LOG:  starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
23:00:06 171644 ERROR 2020-12-08 22:59:56.309 UTC [171677] LOG:  listening on IPv4 address "0.0.0.0", port 5000
23:00:06 171644 ERROR 2020-12-08 22:59:56.309 UTC [171677] LOG:  listening on IPv6 address "::", port 5000
23:00:06 171644 FATAL 2020-12-08 22:59:56.312 UTC [171677] FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5000.lock": Permission denied
23:00:06 171644 ERROR 2020-12-08 22:59:56.315 UTC [171677] LOG:  database system is shut down
23:00:06 171644 WARN  Failed to start Postgres instance at "/home/kelliott/monitor"
23:00:06 171624 ERROR pg_autoctl service listener exited with exit status 12
23:00:06 171624 INFO  Restarting service listener
23:00:06 171687 INFO   /usr/lib/postgresql/13/bin/postgres -D /home/kelliott/monitor -p 5000 -h *
23:00:16 171644 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
23:00:16 171644 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
23:00:16 171644 ERROR Failed to get Postgres pid, see above for details
23:00:16 171644 WARN  Postgres logs from "/home/kelliott/monitor/startup.log":
23:00:16 171644 ERROR 2020-12-08 23:00:06.337 UTC [171687] LOG:  redirecting log output to logging collector process
23:00:16 171644 ERROR 2020-12-08 23:00:06.337 UTC [171687] HINT:  Future log output will appear in directory "log".
23:00:16 171644 WARN  Postgres logs from "/home/kelliott/monitor/log/postgresql-2020-12-08_230006.log":
23:00:16 171644 ERROR 2020-12-08 23:00:06.338 UTC [171687] LOG:  starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
23:00:16 171644 ERROR 2020-12-08 23:00:06.338 UTC [171687] LOG:  listening on IPv4 address "0.0.0.0", port 5000
23:00:16 171644 ERROR 2020-12-08 23:00:06.338 UTC [171687] LOG:  listening on IPv6 address "::", port 5000
23:00:16 171644 FATAL 2020-12-08 23:00:06.341 UTC [171687] FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5000.lock": Permission denied
23:00:16 171644 ERROR 2020-12-08 23:00:06.344 UTC [171687] LOG:  database system is shut down
23:00:16 171644 WARN  Failed to start Postgres instance at "/home/kelliott/monitor"
23:00:16 171685 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
23:00:16 171685 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
23:00:16 171685 ERROR Failed to get Postgres pid, see above for details
23:00:16 171685 ERROR Failed to ensure that Postgres is running in "/home/kelliott/monitor"
23:00:16 171685 ERROR Failed to install pg_auto_failover in the monitor's Postgres database, see above for details
23:00:16 171694 INFO   /usr/lib/postgresql/13/bin/postgres -D /home/kelliott/monitor -p 5000 -h *
23:00:16 171624 ERROR pg_autoctl service listener exited with exit status 12
23:00:16 171624 INFO  Restarting service listener
23:00:26 171644 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
23:00:26 171644 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
23:00:26 171644 ERROR Failed to get Postgres pid, see above for details
23:00:26 171644 WARN  Postgres logs from "/home/kelliott/monitor/startup.log":
23:00:26 171644 ERROR 2020-12-08 23:00:16.356 UTC [171694] LOG:  redirecting log output to logging collector process
23:00:26 171644 ERROR 2020-12-08 23:00:16.356 UTC [171694] HINT:  Future log output will appear in directory "log".
23:00:26 171644 WARN  Postgres logs from "/home/kelliott/monitor/log/postgresql-2020-12-08_230016.log":
23:00:26 171644 ERROR 2020-12-08 23:00:16.356 UTC [171694] LOG:  starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
23:00:26 171644 ERROR 2020-12-08 23:00:16.356 UTC [171694] LOG:  listening on IPv4 address "0.0.0.0", port 5000
23:00:26 171644 ERROR 2020-12-08 23:00:16.356 UTC [171694] LOG:  listening on IPv6 address "::", port 5000
23:00:26 171644 FATAL 2020-12-08 23:00:16.360 UTC [171694] FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5000.lock": Permission denied
23:00:26 171644 ERROR 2020-12-08 23:00:16.363 UTC [171694] LOG:  database system is shut down
23:00:26 171644 WARN  Failed to start Postgres instance at "/home/kelliott/monitor"
23:00:26 171695 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
23:00:26 171695 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
23:00:26 171695 ERROR Failed to get Postgres pid, see above for details
23:00:26 171695 ERROR Failed to ensure that Postgres is running in "/home/kelliott/monitor"
23:00:26 171695 ERROR Failed to install pg_auto_failover in the monitor's Postgres database, see above for details
23:00:26 171704 INFO   /usr/lib/postgresql/13/bin/postgres -D /home/kelliott/monitor -p 5000 -h *
23:00:26 171624 ERROR pg_autoctl service listener exited with exit status 12
23:00:26 171624 FATAL pg_autoctl service listener has already been restarted 5 times in the last 50 seconds, stopping now
23:00:26 171624 INFO  Waiting for subprocesses to terminate.
23:00:31 171624 INFO  pg_autoctl services are still running, signaling them with unknown signal.
23:00:36 171644 ERROR Failed to open file "/home/kelliott/monitor/postmaster.pid": No such file or directory
23:00:36 171644 INFO  Is PostgreSQL at "/home/kelliott/monitor" up and running?
23:00:36 171644 ERROR Failed to get Postgres pid, see above for details
23:00:36 171644 WARN  Postgres logs from "/home/kelliott/monitor/startup.log":
23:00:36 171644 ERROR 2020-12-08 23:00:26.276 UTC [171704] LOG:  redirecting log output to logging collector process
23:00:36 171644 ERROR 2020-12-08 23:00:26.276 UTC [171704] HINT:  Future log output will appear in directory "log".
23:00:36 171644 WARN  Postgres logs from "/home/kelliott/monitor/log/postgresql-2020-12-08_230026.log":
23:00:36 171644 ERROR 2020-12-08 23:00:26.276 UTC [171704] LOG:  starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
23:00:36 171644 ERROR 2020-12-08 23:00:26.276 UTC [171704] LOG:  listening on IPv4 address "0.0.0.0", port 5000
23:00:36 171644 ERROR 2020-12-08 23:00:26.276 UTC [171704] LOG:  listening on IPv6 address "::", port 5000
23:00:36 171644 FATAL 2020-12-08 23:00:26.279 UTC [171704] FATAL:  could not create lock file "/var/run/postgresql/.s.PGSQL.5000.lock": Permission denied
23:00:36 171644 ERROR 2020-12-08 23:00:26.282 UTC [171704] LOG:  database system is shut down
23:00:36 171644 WARN  Failed to start Postgres instance at "/home/kelliott/monitor"
23:00:36 171644 INFO  Postgres controller service received signal SIGTERM, terminating
23:00:36 171624 FATAL Something went wrong in sub-process supervision, stopping now. See above for details.
23:00:36 171624 INFO  Stop pg_autoctl
kelliott@af-db-controller:~$

And the contents of the monitor dir:

kelliott@af-db-controller:~$ ls -l monitor/
total 144
drwx------ 5 kelliott kelliott  4096 Dec  8 22:59 base
-rw------- 1 kelliott kelliott    44 Dec  8 23:00 current_logfiles
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 global
drwx------ 2 kelliott kelliott  4096 Dec  8 23:00 log
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_commit_ts
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_dynshmem
-rw------- 1 kelliott kelliott  4760 Dec  8 22:59 pg_hba.conf
-rw------- 1 kelliott kelliott  1636 Dec  8 22:59 pg_ident.conf
drwx------ 4 kelliott kelliott  4096 Dec  8 22:59 pg_logical
drwx------ 4 kelliott kelliott  4096 Dec  8 22:59 pg_multixact
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_notify
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_replslot
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_serial
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_snapshots
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_stat
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_stat_tmp
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_subtrans
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_tblspc
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_twophase
-rw------- 1 kelliott kelliott     3 Dec  8 22:59 PG_VERSION
drwx------ 3 kelliott kelliott  4096 Dec  8 22:59 pg_wal
drwx------ 2 kelliott kelliott  4096 Dec  8 22:59 pg_xact
-rw------- 1 kelliott kelliott    88 Dec  8 22:59 postgresql.auto.conf
-rw-r--r-- 1 kelliott kelliott   956 Dec  8 22:59 postgresql-auto-failover.conf
-rw------- 1 kelliott kelliott 28185 Dec  8 22:59 postgresql.conf
-rw-rw-r-- 1 kelliott kelliott  4149 Dec  8 22:59 server.crt
-rw------- 1 kelliott kelliott  1708 Dec  8 22:59 server.key
-rw-r--r-- 1 kelliott kelliott   189 Dec  8 23:00 startup.log
DimCitus commented 3 years ago

Thanks @DimCitus I will give that a try. Do you think future support will automatically detect this instead?

Yes. We're going to work on that. Having a default “just work” user experience on debian/ubuntu is a natural goal for this project.

DimCitus commented 3 years ago

22:59:36 171644 FATAL 2020-12-08 22:59:26.542 UTC [171650] FATAL: could not create lock file "/var/run/postgresql/.s.PGSQL.5000.lock": Permission denied

That's the main problem. Are you creating your Postgres instance as the debian postgres user, or another user? If another user, did you add that user to the postgres group? That's the debian packages way...

More context: we could of course detect that in pg_autoctl and then create the socket directory somewhere else, like in /tmp per Postgres defaults when not using the debian packaging ; but then you need to use psql -h /tmp because the libpq client applications on debian will also look for Unix sockets in /var/run/postgresql/ by default.

kevinelliott commented 3 years ago

That fixed it. And so I decided to sudo su - postgres and run all the steps rather than add the group to my personal user, and it worked like I have seen before on other systems. Hurray!

kevinelliott commented 3 years ago

All is good now, the monitor and 2 nodes are running.

However, I just ran into another issue. node_2 had been promoted to primary since I took node_1 down at one point. Brought it back up and node_1 was successfully assigned secondary. Then, using pg_autoctl on the monitor, I promoted node_1 to primary with pg_autoctl perform promotion --name node_1.

Immediately there was an issue with node_2 as it went into an error loop, complaining about missing the node2 path. I have a second SSD (/dev/sdb1) with 1TB of space allocated mounted to /srv/db and had put the node2 dir there, then symbolically linked /var/lib/postgresql/node2 to that. Apparently the demotion caused the source dir to disappear but the symbolic link still existed. I was able to simply copy the backup dir backups/node_2 to /srv/db/node2 and all was well there. Node 2 is successfully identified as secondary.

However, then the primary node node1 is in a wait_primary state and errors with:

postgres@af-db-node1:~$ pg_autoctl create postgres --hostname 10.90.31.21 --auth trust --ssl-self-signed --pgctl /usr/lib/postgresql/13/bin/pg_ctl --monitor 'postgres://autoctl_node@10.90.31.20:5000/pg_auto_failover?sslmode=require' --run
20:10:25 13610 INFO  Using default --ssl-mode "require"
20:10:25 13610 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
20:10:25 13610 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
20:10:25 13610 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
20:10:25 13610 INFO  Started pg_autoctl postgres service with pid 13615
20:10:25 13615 INFO   /usr/bin/pg_autoctl do service postgres --pgdata ./node1 -v
20:10:25 13610 INFO  Started pg_autoctl node-active service with pid 13616
20:10:25 13616 INFO  keeper has been successfully initialized.
20:10:25 13616 INFO   /usr/bin/pg_autoctl do service node-active --pgdata ./node1 -v
20:10:25 13616 INFO  Reloaded the new configuration from "/var/lib/postgresql/.config/pg_autoctl/var/lib/postgresql/node1/pg_autoctl.cfg"
20:10:25 13616 INFO  pg_autoctl service is running, current state is "wait_primary"
20:10:25 13616 WARN  Failed to update the keeper's state from the local PostgreSQL instance.
20:10:25 13616 INFO  Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes.
20:10:25 13616 INFO  Ensuring HBA rules for node 2 "node_2" (10.90.31.22:5002)
20:10:25 13616 INFO  Monitor assigned new state "primary"
20:10:25 13625 INFO   /usr/lib/postgresql/13/bin/postgres -D /srv/db/node1 -p 5001 -h *
20:10:26 13615 INFO  Postgres is now serving PGDATA "/srv/db/node1" on port 5001 with pid 13625
20:10:26 13616 WARN  PostgreSQL was not running, restarted with pid 13625
20:10:26 13616 INFO  FSM transition from "wait_primary" to "primary": A healthy secondary appeared
20:10:26 13616 INFO  Setting synchronous_standby_names to '*'
20:10:26 13616 WARN  Failed to set the standby Target LSN because we don't have a quorum candidate yet
20:10:26 13616 ERROR Failed to transition from state "wait_primary" to state "primary", see above.
20:10:26 13616 ERROR Failed to transition to state "primary", retrying...
20:10:27 13616 INFO  Updated the keeper's state from the local PostgreSQL instance, which is running
20:10:27 13616 INFO  Monitor assigned new state "primary"
20:10:27 13616 INFO  FSM transition from "wait_primary" to "primary": A healthy secondary appeared
20:10:27 13616 INFO  Setting synchronous_standby_names to '*'
20:10:27 13616 WARN  Failed to set the standby Target LSN because we don't have a quorum candidate yet
20:10:27 13616 ERROR Failed to transition from state "wait_primary" to state "primary", see above.
20:10:27 13616 ERROR Failed to transition to state "primary", retrying...

I would imagine it's due to the second disk and having a symbolic link as reference. What would the cleanest way to remedy this be?

DimCitus commented 3 years ago

Please check your streaming replication setup in node2, and then have a look at pg_stat_replication on node1. It looks like your node2 is currently not connected to node1, or that something else is wrong with streaming replication. To debug that, you need to have a look at node2 logs for pg_autoctl and Postgres, and sometimes also Postgres logs from node1.