liberapay / liberapay.com

Source code of the recurrent donations platform Liberapay
https://liberapay.com/
1.64k stars 207 forks source link

PostgreSQL upgrade #992

Closed Changaco closed 3 years ago

Changaco commented 6 years ago

PostgreSQL 10 now Supported in Amazon RDS

We should migrate our instance from 9.6 to 10, and switch from magnetic storage to SSD at the same time (this is done).

Changaco commented 4 years ago

PostgreSQL 12 has been released, and AWS currently supports PostgreSQL 11, but we still haven't upgraded from 9.6.

Changaco commented 4 years ago

I'm trying this now, hopefully there will only be a few minutes of downtime.

Changaco commented 4 years ago

"The instance could not be upgraded because the 'reg' data type is used in user tables. Please remove all usages of 'reg' data types and try again."

Changaco commented 4 years ago

I'm trying this again. The most recent PostgreSQL version currently supported by AWS is 11.5.

Changaco commented 4 years ago

Screenshot of the AWS database upgrade confirmation page

Changaco commented 4 years ago

The upgrade failed again and caused ~10 minutes of downtime.

There were problems executing ""/rdsdbbin/postgres/bin/pg_restore" --host /rdsdbdata/log/upgrade --port 8193 --username rdsadmin --create --exit-on-error --verbose --dbname template1 "pg_upgrade_dump_16390.custom" >> "pg_upgrade_dump_16390.log" 2>&1"
Consult the last few lines of "pg_upgrade_dump_16390.log" for the probable cause of the failure.

I don't see the pg_upgrade_dump_16390.log file, so I have no idea what went wrong.

Changaco commented 4 years ago

The only silver lining is that the app recovered once the DB was back up, so #1662 was a success.

Changaco commented 4 years ago

The AWS Database Migration Service might work better.

Changaco commented 4 years ago

Our reserved database instance is still valid for two months, I'll reconsider what to do then.

Changaco commented 4 years ago

I found the error message from yesterday's failed migration while I was looking for something else:

Database instance is in a state that cannot be upgraded:
pg_restore: [archiver (db)] Error from TOC entry 253; 1259 18167 TABLE tips liberapay
pg_restore: [archiver (db)] could not execute query:
ERROR: cannot cast type publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt.currency_amount to publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt.currency
Command was:
-- For binary upgrade, must preserve pg_type oid
SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('18169'::pg_catalog.oid);
-- For binary upgrade, must preserve pg_type array oid
SELECT pg_catalog.binary_upgrade_set_next_array_pg_type_oid('18168'::pg_catalog.oid);
-- For binary upgrade, must preserve pg_type toast oid
SELECT pg_catalog.binary_upgrade_set_next_toast_pg_type_oid('37200961'::pg_catalog.oid);
-- For binary upgrade, must preserve pg_class oids
SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('18167'::pg_catalog.oid);
SELECT pg_catalog.binary_upgrade_set_next_toast_pg_class_oid('37200960'::pg_catalog.oid);
SELECT pg_catalog.binary_upgrade_set_next_index_pg_class_oid('37200962'::pg_catalog.oid);
CREATE TABLE "publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt"."tips"
( "id" integer NOT NULL
, "ctime" timestamp with time zone NOT NULL
, "mtime" timestamp with time zone DEFAULT "now"() NOT NULL
, "tipper" bigint NOT NULL
, "tippee" bigint NOT NULL
, "amount" "publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt"."currency_amount" NOT NULL
, "is_funded" boolean DEFAULT false NOT NULL
, "period" "publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt"."donation_period" NOT NULL
, "periodic_amount" "publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt"."currency_amount" NOT NULL
, "paid_in_advance" "publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt"."currency_amount"
, "renewal_mode" integer DEFAULT 1 NOT NULL
, CONSTRAINT "no_self_tipping" CHECK (("tipper" <> "tippee"))
, CONSTRAINT "paid_in_advance_currency_chk" CHECK ((("paid_in_advance")::"publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt"."currency" = ("amount")::"publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt"."currency"))
, CONSTRAINT "tips_amount_check" CHECK (("amount" OPERATOR("publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt".>=) (0)::numeric))
, CONSTRAINT "tips_periodic_amount_check" CHECK (("periodic_amount" OPERATOR("publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt".>) (0)::numeric)) );
-- For binary upgrade, set heap's relfrozenxid and relminmxid
UPDATE pg_catalog.pg_class SET relfrozenxid = '6324909', relminmxid = '1177' WHERE oid = '"publicjvzwop5mmubwy1wyrtg7pqrni1j5bknt"."tips"'::pg_catalog.regclass;
-- For binary upgrade, set toast's relfrozenxid and relminmxid
UPDATE pg_catalog.pg_class SET relfrozenxid = '6324909', relminmxid = '1177' WHERE oid = '37200960';
Changaco commented 3 years ago

The migration was finally performed yesterday. See https://github.com/liberapay/salon/issues/442#issuecomment-844413085 for context.


Here is how I ended up doing it:

  1. Launched a new empty database instance.
  2. Created the role to be used by the webapp.
  3. Initiated the database with sql/schema.sql and sql/prod.sql.
  4. Disabled all custom triggers on every table, because they were making pg_restore crash in my local tests. (ALTER TABLE xxx DISABLE TRIGGER USER)
  5. Switched the webapp to read-only mode by adding db.readonly = True in liberapay/wireup.py and restarting apache.
  6. Dumped the database with pg_dump -Fc >db.pg_dump.
  7. Customized the restore order, because pg_restore gets it wrong. (pg_restore -l -a >restore.txt && nano restore.txt).
  8. Loaded the data into the new database (pg_restore [connection settings] db.pg_dump -a --single-transaction -L restore.txt).
  9. Re-enabled all custom triggers on every table. (ALTER TABLE xxx ENABLE TRIGGER USER)
  10. Commented out the line added to liberapay/wireup.py in step 5.
  11. Modified the DATABASE_URL environment variable in Elastic Beanstalk to point to the new database and waited for this change to be applied.
  12. Checked the website and stopped the old database.
  13. Shredded the database dump. (shred -uz db.pg_dump)