pgautoupgrade / docker-pgautoupgrade

A PostgreSQL Docker container that automatically upgrades your database
https://hub.docker.com/r/pgautoupgrade/pgautoupgrade
MIT License
537 stars 19 forks source link

Socket Errors Upgrading from 15 to 16 #32

Closed gav-fyi closed 2 months ago

gav-fyi commented 3 months ago

Hi, I've recently found this package which looks extremely promising for upgrading our development environments from Postgres 15 to Postgres 16.

I'm trying to attempt the one-shot upgrade of a large (~20GB) Postgres 15 databsse to 16, without success. I have attached the output of the command/logs below.

Command

docker run --name pgauto -it --rm -p 5432:5432 -v /var/run/postgresql:/var/run/postgresql --mount type=bind,source=/Users/gav/postgres_data/project,target=/var/lib/postgresql/data -e POSTGRES_PASSWORD=redacted -e PGAUTO_ONESHOT=yes pgautoupgrade/pgautoupgrade:16-alpine

The output I get in the terminal is (reduced for brevity; let me know if you need earlier logs)

PostgreSQL Database directory appears to contain a database; Skipping initialization

************************************
PostgreSQL data directory: /var/lib/postgresql/data
************************************
*******************************************************************************************
Performing PG upgrade on version 15 database files.  Upgrading to version 16
*******************************************************************************************

[...]

2024-07-05 13:52:45.492 UTC [35] WARNING:  database "postgres" has no actual collation version, but a version was recorded
2024-07-05 13:52:45.495 UTC [35] LOG:  checkpoint starting: shutdown immediate
2024-07-05 13:52:45.499 UTC [35] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.005 s; sync files=3, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB

[...]

---------------------------------------------------------------
The initdb arguments we determined are: --locale=en_US.utf8 --lc-collate=en_US.utf8 --lc-ctype=en_US.utf8 --encoding=UTF8
---------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------
Old database using collation settings: '--locale=en_US.utf8 --lc-collate=en_US.utf8 --lc-ctype=en_US.utf8 --encoding=UTF8'.  Initialising new database with those settings too
--------------------------------------------------------------------------------------------------------------------
Initialising PostgreSQL 16 data directory
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data/new ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/local/bin/pg_ctl -D /var/lib/postgresql/data/new/ -l logfile start

------------------------------------
New database initialisation complete
------------------------------------
---------------------------------------
Running pg_upgrade command, from /var/lib/postgresql/data
---------------------------------------
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok

*failure*
Consult the last few lines of "/var/lib/postgresql/data/new/pg_upgrade_output.d/20240705T135247.398/log/pg_upgrade_server.log" for
the probable cause of the failure.

connection to server on socket "/var/lib/postgresql/data/.s.PGSQL.50432" failed: No such file or directory
    Is the server running locally and accepting connections on that socket?

could not connect to source postmaster started with the command:
"/usr/local-pg15/bin/pg_ctl" -w -l "/var/lib/postgresql/data/new/pg_upgrade_output.d/20240705T135247.398/log/pg_upgrade_server.log" -D "/var/lib/postgresql/data/old" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/postgresql/data'" start
Failure, exiting

With the log file showing

-----------------------------------------------------------------
  pg_upgrade run on Fri Jul  5 13:52:47 2024
-----------------------------------------------------------------

command: "/usr/local-pg15/bin/pg_ctl" -w -l "/var/lib/postgresql/data/new/pg_upgrade_output.d/20240705T135247.398/log/pg_upgrade_server.log" -D "/var/lib/postgresql/data/old" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/postgresql/data'" start >> "/var/lib/postgresql/data/new/pg_upgrade_output.d/20240705T135247.398/log/pg_upgrade_server.log" 2>&1
waiting for server to start....2024-07-05 13:52:47.440 UTC [79] LOG:  starting PostgreSQL 15.7 on aarch64-unknown-linux-musl, compiled by gcc (Alpine 13.2.1_git20231014) 13.2.1 20231014, 64-bit
2024-07-05 13:52:47.441 UTC [79] LOG:  could not set permissions of file "/var/lib/postgresql/data/.s.PGSQL.50432": Invalid argument
2024-07-05 13:52:47.441 UTC [79] WARNING:  could not create Unix-domain socket in directory "/var/lib/postgresql/data"
2024-07-05 13:52:47.441 UTC [79] FATAL:  could not create any Unix-domain sockets
2024-07-05 13:52:47.442 UTC [79] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

Ay assistance or pointers would be greatly appreciated.


Hardware: MacBook Pro M2 Original Postgres Image: postgres:15

justinclift commented 3 months ago

Oh, that's a weird one I've not seen before.

Just to double check, there's no partitions anywhere running out of disk space or anything along those lines when the error occurs yeah?

Mostly asking that just in case, as a filled disk can cause all kinds of weird errors.


Note that I'm just about to hit the sack personally, and it looks like I've caught either the flu or Covid (for a 2nd time, ugh). So I might not be looking at any response here for a while.

Hopefully some of the other team members have ideas on what the cause and fix for this might be. :smile:

justinclift commented 3 months ago

Ay assistance or pointers would be greatly appreciated.

If all else fails and you need to debug the problem, you can start the container with the environment variable PGAUTO_DEVEL=before defined, which will start the container (and leave it running) just before the pgautoupgrade script runs. (like is done here, if that helps)

If you do that, then you can exec into the container and manually run the upgrade script commands one (or a few) at a time until you reach the point where it's going wrong/breaking.

With doing that, it should be feasible to figure out exactly why it's breaking, then figure out a fix.

Which could then be applied to the repo here for everyone to benefit from. That's the theory anyway. :smile:

If you're already pretty familiar with shell scripting then it might be a useful option. :smile:

gav-fyi commented 3 months ago

There are some great pointers there, thank you very much. I'll take a look on Monday when I'm back in the office, and will be sure to update you on any progress made.

Thanks for the quick reply, and I hope you start to feel better soon.

gav-fyi commented 2 months ago

The failure was very much with the pg_upgrade command, so had to dig a little deeper.

I found a solution on a similar library: https://github.com/tianon/docker-postgres-upgrade/issues/83#issuecomment-1960681055

The issue was due to M-chip Macs not supporting the Unix socket filetype, so I had to update the pg_upgrade command to be

/usr/local/bin/pg_upgrade --username="${POSTGRES_USER}" --link -d "${OLD}" -D "${NEW}" -b "${OLDPATH}/bin" -B /usr/local/bin --socketdir="/var/run/postgresql"

Note: the addition of --socketdir="/var/run/postgresql"

The command then ran fine. Thanks for the pointers. I now have other errors (to do with my DB), but this issue is resolved.

I'll leave it open for now though, as I'm not sure if you'd want to include something in the main README for this, or a way to specify the --socketdir via environment variables.

justinclift commented 2 months ago

Oh wow, that's totally unexpected. I had no idea that limitation existed.

It sounds like we could include your --socketdir approach by default too, without negatively affecting others. Will need to do some investigation and see how that plays out. :smile:

justinclift commented 2 months ago

Initial testing on my desktop system here (Debian 12 x86_64) looks like it works fine too, so I've just thrown together a PR to add the --socketdir option to all of the docker images: https://github.com/pgautoupgrade/docker-pgautoupgrade/pull/33

justinclift commented 2 months ago

k, that PR has been merged. Once the automatic GitHub Actions rebuild of the docker images happens then you should be fine to use them without worrying about this particular issue again.

In theory. :wink:

gav-fyi commented 2 months ago

Amazing. Thanks for the quick turnaround. šŸ‘šŸ»

justinclift commented 2 months ago

No worries at all. Thanks for taking the time to investigate the root cause and figure out a solid solution. :smile:

gav-fyi commented 2 months ago

I can confirm that the latest image from DockerHub works perfectly. šŸ‘šŸ»

Thanks again.