vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on Patroni). Automating with Ansible.
https://postgresql-cluster.org
MIT License
1.6k stars 405 forks source link

few thoughts about upgrade role #779

Open glushakov opened 5 days ago

glushakov commented 5 days ago

Hi. Thanks for your work. I based your role on a major upgrade of postgres and would like to share some of the cases I encountered.

1) Patroni precheck https://github.com/vitabaks/postgresql_cluster/blob/master/automation/pg_upgrade.yml#L19 doesn't consider schema to get access to Patroni API (http/https)

2) Tablespaces.

3) In case of scheme precheck fail https://github.com/vitabaks/postgresql_cluster/blob/master/automation/roles/upgrade/tasks/schema_compatibility.yml#L109 task with reinit not started https://github.com/vitabaks/postgresql_cluster/blob/master/automation/roles/upgrade/tasks/schema_compatibility.yml#L121 and new pgdata is not cleared and postgres is not stopped, which makes it impossible to restart.

4) Parameters

vitabaks commented 5 days ago

Thanks for the feedback @glushakov

If possible, attach more doers, error logs, how to reproduce, an example of the fix code that you mentioned. This will help to implement improvements in a shorter time.

Also feel free to offer PR especially for small fixes.

glushakov commented 4 days ago

For some cases there is no code yet, because they are still in the design stage (for example, transferring parameters from postgresql.base.conf).

Errors, for example for existing tablespace error:


$ mkdir ts/my_tblspc
$ psql
postgres=# create tablespace qwe location '/var/lib/pgsql/ts/my_tblspc';

<..cut ..>

TASK [Print tablespace location] ***************************************************************************************************************************************************************************************************************************************************
skipping: [alma-db2]
ok: [alma-db1] => {
    "pg_tablespace_location.stdout_lines": [
        "/var/lib/pgsql/ts/my_tblspc"
    ]
}

<..cut ..>

TASK [Result of checking the compatibility of the scheme - success] ****************************************************************************************************************************************************************************************************************
ok: [alma-db1] => {
    "msg": "The database schema are compatible with PostgreSQL 16"
}

<..cut..>

TASK [Print the result of the pg_upgrade check] ************************************************************************************************************************************************************************************************************************************
fatal: [alma-db1]: FAILED! => {
    "failed_when_result": true,
    "pg_upgrade_check_result.stdout_lines": [
        "Performing Consistency Checks on Old Live Server",
        "------------------------------------------------",
        "Checking cluster versions                                     ok",
        "Checking database user is the install user                    ok",
        "Checking database connection settings                         ok",
        "Checking for prepared transactions                            ok",
        "Checking for system-defined composite types in user tables    ok",
        "Checking for reg* data types in user tables                   ok",
        "Checking for contrib/isn with bigint-passing mismatch         ok",
        "Checking for incompatible \"aclitem\" data type in user tables  ok",
        "Checking for presence of required libraries                   ok",
        "Checking database user is the install user                    ok",
        "Checking for prepared transactions                            ok",
        "Checking for new cluster tablespace directories               ",
        "new cluster tablespace directory already exists: \"/var/lib/pgsql/ts/my_tblspc/PG_16_202307071\"",
        "Failure, exiting"
    ]
}

The "schema precheck" process created a tablespace in the /var/lib/pgsql/ts/my_tblspc/PG_16_202307071 directory, but then did not delete it. Then the pg_upgrade --check process detect that this directory already contained elements from the TS and failed.

As i said earlier, my decision is not to use tablespace in pg_dumpall command for schema checks (add --no-tablespaces)