stolostron / multicluster-global-hub

the main repository for the multicluster global hub
Apache License 2.0
21 stars 32 forks source link

Upgrade builtin Postgres from 13 to 16 #1253

Open yanmxa opened 4 days ago

yanmxa commented 4 days ago

Change the image directly

  1. from 13 to 16
Incompatible data directory.  This container image provides
PostgreSQL '16', but data directory is of
version '13'.

This image supports automatic data directory upgrades from
'15', please _carefully_ consult image documentation
about how to use the '$POSTGRESQL_UPGRADE' startup option.

Related document: https://docs.redhat.com/en/documentation/red_hat_decision_manager/7.13/html/release_notes_for_red_hat_decision_manager_7.13/rn-7.13.2-known-issues-ref#red_hat_openshift_container_platform_7

  1. from 13 to 15
Incompatible data directory.  This container image provides
PostgreSQL '15', but data directory is of
version '13'.

This image supports automatic data directory upgrade from
'13', please _carefully_ consult image documentation
about how to use the '$POSTGRESQL_UPGRADE' startup option.

Upgrade with POSTGRESQL_UPGRADE option(Env)

  1. from 13 to 16
    
    # change env
    POSTGRESQL_UPGRADE=copy
    # change image
    quay.io/myan/postgresql-16:9.5-1732622748

... With this container image you can only upgrade from data directory of version '15', not '13'.


2. from 13 to 15

```bash
POSTGRESQL_UPGRADE=copy
quay.io/myan/postgresql-15:1-14

...

==========  $PGDATA upgrade: 13 -> 15  ==========

Can't read /etc/scl/conf/rh-postgresql13, rh-postgresql13 is probably not installed.

===>  Starting old postgresql once again for a clean shutdown...

/usr/share/container-scripts/postgresql/common.sh: line 346: /opt/rh/rh-postgresql13/root/usr/bin/pg_ctl: No such file or directory

The upgraded image is from this repos: https://github.com/sclorg/postgresql-container. It's common.sh doesn't contain any rh-postgresql13 info, so we need to upgrade from 13 to 15, and the image version 15 will from the redhat Postgres image!

yanmxa commented 1 day ago

To handle the complexity of upgrading PostgreSQL from version 13 to 16 (e.g., 13 → 15, then 15 → 16) automatically and avoid the errors mentioned above, we will only support manual PostgreSQL upgrades due to the numerous steps involved.

~The upgrade process involves the following steps:~

~1. Add the annotation to MGH to disable the manager from consuming messages from the transport to the current database:~

"global-hub.open-cluster-management.io/postgres-upgrade"="backup"

~2. Back up the current database manually.~ ~3. Create a new version of Postgres by changing the annotation value to "restore"(operator). It will delete the current database.~

"global-hub.open-cluster-management.io/postgres-upgrade"="restore"

~4. Restore the backup to the new PostgreSQL instance.~ ~5. Remove the annotation, and the manager will connect to the new PostgreSQL storage.~

~Note: This process will not result in data loss, as the manager supports resuming from the last consumed message upon restart.~

yanmxa commented 1 day ago

Solution for the upgrade:

After discussing with @clyang82, we have agreed to name the PostgreSQL instance with its version. This way, when upgrading to the next release, PostgreSQL will be upgraded to version from 13 to 16, and the instance name will change to multicluster-global-hub-postgres16 (the original name is multicluster-global-hub-postgres).

After upgrading the instance, the cluster and policy data will be restored automatically by its resync mechanism. However, there will still be two issues in the Global Hub system:

  1. Event and history data loss in the new instance. Solution: Customers can choose to restore the data from the original database. Alternatively, they can ignore it if the history data is not important to them.

  2. The original StatefulSet instance will still exist in the Global Hub namespace. Solution -> Customers can remove it manually if they no longer need it(history data).

yanmxa commented 1 day ago

Optional 1: Restore History tables

history.local_compliance, history.local_compliance_job_log

yanmxa commented 1 day ago

Optional 2: Restore Event tables

event.local_policies, event.local_root_policies, event.managed_clusters

yanmxa commented 20 hours ago

TODO:

  1. extra PV requirements
  2. support matrix
  3. operand: mgh- component - from multicluster-global-hub-postgres to multicluster-global-hub-postgresql or label
  4. naming for postgres svc, secret
  5. document update => use label or new name

=> STS: multicluster-global-hub-postgres -> multicluster-global-hub-postgresql + label: version = 16

yanmxa commented 20 hours ago

Related document: https://docs.google.com/document/d/1Wj5et_PVP4is7XjxKiacER_Hzy9Ad-hUiSw6Yx-Y2VA/edit?tab=t.0