pgautoupgrade / docker-pgautoupgrade

A PostgreSQL Docker container that automatically upgrades your database
https://hub.docker.com/r/pgautoupgrade/pgautoupgrade
MIT License
522 stars 18 forks source link

Drop in replacement? #1

Open wuast94 opened 1 year ago

wuast94 commented 1 year ago

Just for clarification, is this a "drop in" replacement for official postgress container? are variables paths etc the same? anything to look for?

justinclift commented 1 year ago

In theory (!) yes.

It's the official postgres 15.3-alpine3.18 Docker image + the older versions of PostgreSQL compiled into it and some pg_upgrade scripting added to the docker entrypoint script to run the upgrade.

If you're good with the 15.3-alpine3.18 build, then this should run fine.

That being said, this hasn't had wide use yet so it's probably best to make sure anything you're running with this has good backups happening on schedule.

I'm not expecting problems, more just in case... :smile:

wuast94 commented 1 year ago

I will test this on a synapse and some bridge servers, have Backups and if something brakes no data is lost, let's say no data that Cares me much. Let's see what happens πŸ€ͺ

justinclift commented 1 year ago

Awesome! :smile:

wuast94 commented 1 year ago

so far so good, all DB´s come up and apps are running, just some that was on Debian before complained a bit 😝 but that was to be expected. lets see how they will handle the next pg update

justinclift commented 1 year ago

Is the "some that was on Debian before complained a bit" stuff something that would be practical to detect and do something useful for?

wuast94 commented 1 year ago

i see in my logs database "postgres" has no actual collation version, but a version was recorded

2023-07-11 15:41:37.277 CEST [1] LOG:  starting PostgreSQL 15.3 on x86_64-pc-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
2023-07-11 15:41:37.277 CEST [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-07-11 15:41:37.277 CEST [1] LOG:  listening on IPv6 address "::", port 5432
2023-07-11 15:41:37.278 CEST [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-07-11 15:41:37.281 CEST [26] LOG:  database system was shut down at 2023-07-11 15:41:07 CEST
2023-07-11 15:41:37.285 CEST [1] LOG:  database system is ready to accept connections
2023-07-11 15:41:40.846 CEST [31] WARNING:  database "mautrixsignal" has no actual collation version, but a version was recorded
2023-07-11 15:42:37.345 CEST [32] WARNING:  database "mautrixsignal" has no actual collation version, but a version was recorded
2023-07-11 15:43:37.402 CEST [33] WARNING:  database "mautrixsignal" has no actual collation version, but a version was recorded
2023-07-11 15:44:37.458 CEST [34] WARNING:  database "mautrixsignal" has no actual collation version, but a version was recorded
2023-07-11 15:45:37.516 CEST [35] WARNING:  database "mautrixsignal" has no actual collation version, but a version was recorded
2023-07-11 15:46:37.378 CEST [24] LOG:  checkpoint starting: time
2023-07-11 15:46:37.383 CEST [24] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.002 s, total=0.006 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB
2023-07-11 15:46:37.595 CEST [36] WARNING:  database "mautrixsignal" has no actual collation version, but a version was recorded
2023-07-11 15:47:37.619 CEST [37] WARNING:  database "mautrixsignal" has no actual collation version, but a version was recorded

Dont know if its related: https://bbs.archlinux.org/viewtopic.php?id=283462

justinclift commented 1 year ago

Interesting, thanks. I'll take a look later on today. :smile:

justinclift commented 1 year ago

Ok, it'll have to be this wkend when I look at this. I have a reasonable idea of what the message is about, and it shouldn't be too hard to fix. In theory.

wuast94 commented 1 year ago

There seems to be a collation mismatch, my db uses C and C dont have a version if i understand it right

grafik

if i understand it right, the only thing i can do is dumb the db, recreate a fresh db and import the db back. but will wait what u say

justinclift commented 1 year ago

Yeah, that's pretty much my thinking of the cause as well.

The solutions is probably going to be some changes to this line where it does the initdb:

https://github.com/justinclift/docker-pgautoupgrade/blob/93629ba0fcf59a9bbc05265838345f58cc1a0da9/docker-entrypoint.sh#L368

So, instead of doing it blindly like that, we'll need to first determine the collation for the existing database (may need to temporarily start the old version?), then we can run initdb and pass it the same collation info so things match up.

From the initdb --help output:

      --no-locale           equivalent to --locale=C
      --icu-locale=LOCALE   set ICU locale ID for new databases
      --locale=LOCALE       set default locale for new databases
      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE
      --lc-monetary=, --lc-numeric=, --lc-time=LOCALE
                            set default locale in the respective category for
                            new databases (default taken from environment)

So for the C locale it'd just be the first one --no-locale.

This is all just theory though (so far). :smile:

wuast94 commented 1 year ago

if i can help you testing this say what i have to do 😊 but sounds like it could work

justinclift commented 1 year ago

Thanks. I won't have time to try implementing it for several hours, and it might be a tomorrow thing instead though. Just so you know. :smile:

justinclift commented 1 year ago

Taking an initial look at this, so dumping some rough notes and thoughts here for now.


There seem to be several different ways of getting the collation info for a given database out of PostgreSQL.

For example:

$ psql database_name
# SELECT * FROM pg_database;

 oid |  datname  | datdba | encoding | datlocprovider | datistemplate | datallowconn | datconnlimit | datfrozenxid | datminmxid | dattablespace | datcollate |  datctype  | daticulocale | datcollversion |               datacl                
-----+-----------+--------+----------+----------------+---------------+--------------+--------------+--------------+------------+---------------+------------+------------+--------------+----------------+-------------------------------------
   5 | postgres  |     10 |        6 | c              | f             | t            |           -1 |          717 |          1 |          1663 | en_US.utf8 | en_US.utf8 |              |                | 
   1 | template1 |     10 |        6 | c              | t             | t            |           -1 |          717 |          1 |          1663 | en_US.utf8 | en_US.utf8 |              |                | {=c/postgres,postgres=CTc/postgres}
   4 | template0 |     10 |        6 | c              | t             | f            |           -1 |          717 |          1 |          1663 | en_US.utf8 | en_US.utf8 |              |                | {=c/postgres,postgres=CTc/postgres}

From that, the datcollate, datctype, daticulocale, and datcollversion columns seem relevant.

The problem there is it doesn't seem to really tell us what the default locale and collation pieces are for the database cluster.

What seems to work for doing that, are the SHOW LC_COLLATE and SHOW LC_CTYPE commands:

# SHOW LC_COLLATE;
 lc_collate 
------------
 en_US.utf8
(1 row)

# SHOW LC_CTYPE;
  lc_ctype  
------------
 en_US.utf8
(1 row)

They can also be run with PostgreSQL in single user mode. Example from a PG 9.5 database using the C collation:

$ /usr/local-pg9.5/bin/postgres --single -D /var/lib/postgresql/data/ninefive

PostgreSQL stand-alone backend 9.5.25
backend> SHOW LC_COLLATE;
         1: lc_collate  (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: lc_collate = "C"    (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
backend> SHOW LC_CTYPE;
         1: lc_ctype    (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: lc_ctype = "C"      (typeid = 25, len = -1, typmod = -1, byval = f)
        ----

So, with some shell pipelining we can work things out:

$ echo 'SHOW LC_COLLATE' | /usr/local-pg9.5/bin/postgres --single -D /var/lib/postgresql/data/ninefive | grep 'lc_collate = "' | cut -d '"' -f 2
C
$ echo 'SHOW LC_CTYPE' | /usr/local-pg9.5/bin/postgres --single -D /var/lib/postgresql/data/ninefive | grep 'lc_ctype = "' | cut -d '"' -f 2
C

Trying that again, with a PG 9.5 data directory initialised with en_US.UTF-8 works as well:

$ echo 'SHOW LC_COLLATE' | /usr/local-pg9.5/bin/postgres --single -D /var/lib/postgresql/data/ninefive | grep 'lc_collate = "' | cut -d '"' -f 2
en_US.UTF-8
05c67bf06846:~$ echo 'SHOW LC_CTYPE' | /usr/local-pg9.5/bin/postgres --single -D /var/lib/postgresql/data/ninefive | grep 'lc_ctype = "' | cut -d '"' -f 2
en_US.UTF-8

This seems like a workable way forward.

wuast94 commented 1 year ago

I think the main problem is to change the collation not to get the info what the actual used and initialized is.

I'm on the road ATM so will have a deeper look later.

And now worries, it's not a time critical thing, so take your time, you doing this in your free time, so no pressure ☺️

justinclift commented 1 year ago

I think the main problem is to change the collation not to get the info what the actual used and initialized is.

Nah. The default collation is set when the initdb is run, so we might as well just get the details for the old one and copy that. That's my thinking (so far) anyway. :smile:

justinclift commented 1 year ago

Thinking about it more the initdb style change is probably only going to be good for databases that haven't been auto-upgraded yet.

So for those that have and had their locale changed (thus the version mismatch message)... yeah, that's going to need something different done.

justinclift commented 1 year ago

if i understand it right, the only thing i can do is dumb the db, recreate a fresh db and import the db back. but will wait what u say

That's probably going to be the most straight forward approach. You'd likely need to look at the starting section of the pg_dump output (before importing) to potentially manually remove any collation setting statements.

kenaniah commented 1 year ago

I believe you should be able to use pg_controldata to determine the locale of a cluster without having to run the server.

justinclift commented 1 year ago

Thanks heaps @kenaniah, I'll try that out locally then update the script accordingly. :smile:

justinclift commented 1 year ago

@kenaniah Trying that just now it does show a lot of information, but collation related info doesn't seem to be part of it:

$ /usr/local-pg9.5/bin/pg_controldata -V
pg_controldata (PostgreSQL) 9.5.25

$ /usr/local-pg9.5/bin/pg_controldata -D /var/lib/postgresql/data/old
pg_control version number:            942
Catalog version number:               201510051
Database system identifier:           7256631598560251934
Database cluster state:               shut down
pg_control last modified:             Mon Jul 17 03:56:50 2023
Latest checkpoint location:           0/1703AC0
Prior checkpoint location:            0/1703A50
Latest checkpoint's REDO location:    0/1703AC0
Latest checkpoint's REDO WAL file:    000000010000000000000001
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/620
Latest checkpoint's NextOID:          12377
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        612
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Mon Jul 17 03:56:50 2023
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    minimal
wal_log_hints setting:                off
max_connections setting:              100
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

Using the newest release of PG (15.3) shows the same thing. It's no worries though, as the single user server approach seems to work reliably and not really have much in the way of downsides.

fOmey commented 11 months ago

justinclift

I'm a PG newbie and I've stumbled across the collation error.

Are there steps to manually resolve this until a fix is implemented? All the fix search results don't seem to be working.

justinclift commented 11 months ago

@fOmey What was the (exact) PostgreSQL image you were previously using?

I still need to look into this issue in detail, so I don't have any immediate suggestion right away. But knowing the exact PG image you were previously using might lead to some useful suggestions. :smile:

fOmey commented 11 months ago

@fOmey What was the (exact) PostgreSQL image you were previously using?

I still need to look into this issue in detail, so I don't have any immediate suggestion right away. But knowing the exact PG image you were previously using might lead to some useful suggestions. πŸ˜„

I would have been using the version before the latest.

justinclift commented 11 months ago

I would have been using the version before the latest.

Sorry, that's a bit unclear. Might have been my fault. :smile:

What version of PostgreSQL were you using before trying pgautoupgrade on your PostgreSQL data directory, and (it's important) where did you get that version of PostgreSQL from?

For example, there are several different Docker Hub repositories that can provide PostgreSQL, and there are non-Docker-Hub providers of PostgreSQL too. :smile:

fOmey commented 11 months ago

I would have been using the version before the latest.

Sorry, that's a bit unclear. Might have been my fault. πŸ˜„

What version of PostgreSQL were you using before trying pgautoupgrade on your PostgreSQL data directory, and (it's important) where did you get that version of PostgreSQL from?

For example, there are several different Docker Hub repositories that can provide PostgreSQL, and there are non-Docker-Hub providers of PostgreSQL too. πŸ˜„

All good, my mistake I should have been more clear.

I was using the official postgres image on dockerhub, latest tag: https://hub.docker.com/_/postgres/

I switched over to pgautoupgrade last month (Sep 17).

justinclift commented 11 months ago

Thanks, yeah that closer to the info I'm asking about. :smile:

In that repository, they have both Alpine Linux based images (eg 16-alpine), and non alpine based ones.

Do you remember (exactly) what tag/image you were previously using?

I'm guessing it wasn't an Alpine based image, and the switch to an Alpine based one (aka pgautoupgrade) is a contributing factor. But if you were actually using one of their Alpine based images, that'll mean a different thing is probably coming into play.

justinclift commented 11 months ago

latest tag

Heh. Just now that sank in. :smile:

So, looking at their list of tags is seems like latest is the equivalent to bookworm, which is a non-Alpine based image.

That'll mean the collation error is probably being caused (somehow) by the switch from glibc to musl as the base backend c library. I haven't yet investigated how to fix that though, so I'm not really sure what you can try. :frowning:

justinclift commented 11 months ago

The info here is likely to be of use though:

https://wiki.postgresql.org/wiki/Locale_data_changes

At least, that's on my todo list for reading ~first when I next look into solving this.

justinclift commented 2 months ago

@wuast94 @kenaniah @fOmey If anyone's up for testing a new Debian based pgautoupgrade image, then @andyundso has officially added ones for PostgreSQL 15 and 16:

Anyone up for giving it a go then reporting back whether the collation weirdness is all fixed? :smile: