lahwaacz / wiki-scripts

Framework for writing bots, maintenance scripts or performing data analysis on wikis powered by MediaWiki
http://lahwaacz.github.io/wiki-scripts/
GNU General Public License v3.0
27 stars 12 forks source link

Database upgrade #90

Closed kynikos closed 2 years ago

kynikos commented 2 years ago

Hi, Postgresql got receently bumped from version 13 to 14, so now the service fails to start with:

Apr 24 12:59:29 #### systemd[1]: Starting PostgreSQL database server...
Apr 24 12:59:29 #### postgres[3016]: An old version of the database format was found.
Apr 24 12:59:29 #### postgres[3016]: See https://wiki.archlinux.org/index.php/PostgreSQL#Upgrading_PostgreSQL
Apr 24 12:59:29 #### systemd[1]: postgresql.service: Control process exited, code=exited, status=1/FAILURE
Apr 24 12:59:29 #### systemd[1]: postgresql.service: Failed with result 'exit-code'.
Apr 24 12:59:29 #### systemd[1]: Failed to start PostgreSQL database server.

And wiki-scripts obviously complains with sqlalchemy.exc.OperationalError.

Chance I can be spared from reading pages of documentation to complete the upgrade? Pretty please :P

lahwaacz commented 2 years ago

Hi, you can try running the upgrade script from Arch infra (as root): https://gitlab.archlinux.org/archlinux/infrastructure/-/blob/master/roles/postgres/files/upgrade_pg.sh

I'd still recommend reading the documentation, though :D There might be some steps that are not in the script, e.g. removing the backup dir /var/lib/postgres/data-14 when you check that the upgrade worked.

kynikos commented 2 years ago

Wow, thank you, this made me sweat, for some reason the following:

# cat /var/lib/postgres/data/PG_VERSION

was returning 13 as expected, but when I went to run the script (which I'd properly inspected before), its line #59 complained that it was upgrading from version 12 and couldn't find some libraries, so I downgraded postgresql-old-upgrade to the latest 12 version and ran the remaining commands manually. Needless to say that besides that downgraded package, my system is fully synced.

Anyway at that point the postgresql service restarted, however, while toc.py and sort-maintainers.py seem to complete successfully, statistics.py now always raises:

Traceback (most recent call last):
  File "/home/dario/dev/arch/wiki-scripts/statistics.py", line 311, in <module>
    sys.exit(statistics.run())
  File "/home/dario/dev/arch/wiki-scripts/statistics.py", line 98, in run
    self.db.sync_with_api(self.api)
  File "/home/dario/dev/arch/wiki-scripts/ws/db/database.py", line 148, in sync_with_api
    grabbers.synchronize(self, api, with_content=with_content, check_needs_update=check_needs_update)
  File "/home/dario/dev/arch/wiki-scripts/ws/db/grabbers/__init__.py", line 39, in synchronize
    GrabberRevisions(api, db, with_content=with_content).update()
  File "/home/dario/dev/arch/wiki-scripts/ws/db/grabbers/GrabberBase.py", line 129, in update
    self._execute(gen, sync_timestamp)
  File "/home/dario/dev/arch/wiki-scripts/ws/db/grabbers/GrabberBase.py", line 137, in _execute
    for item in gen:
  File "/home/dario/dev/arch/wiki-scripts/ws/db/grabbers/revision.py", line 473, in gen_update
    raise NotImplementedError("Cannot merge revisions from [[{}]] to [[{}]]: target page has been moved.")
NotImplementedError: Cannot merge revisions from [[{}]] to [[{}]]: target page has been moved.

1) Do you think that's due to a botched database upgrade? 2) What's my best way to check the integrity of the wiki-scripts database?

lahwaacz commented 2 years ago

Hmm, that's weird. The error message is incomplete, but I fixed it and it's likely caused by something else - the error usually happens only when the script tries to synchronize over a too large time span, but you synchronized recently before the upgrade and there were no changes on the wiki that would have caused this problem.

Can you run the following to check what is actually in the database as the last synchronization timestamp?

#! /usr/bin/env python3
from ws.db.database import Database
import ws.config
db = ws.config.object_from_argparser(Database, description="test")
s = db.ws_sync.select()
result = db.engine.connect().execute(s)
for row in result:
    print(row)

Now I got an idea what might have gone wrong. Maybe you had /var/lib/postgres/data-13 already before the upgrade (maybe as a backup made after the previous upgrade from 12 to 13)? Then the command on the line 43 of the script would move /var/lib/postgres/data inside the existing directory instead of just renaming it to /var/lib/postgres/data-13 and from there things would go wrong since the upgrade would not restore the last state of the database. But this hypothesis still has weak points, e.g. where it picked up version 12...

Anyway, I think you can still recover the correct database by deleting /var/lib/postgres/data and running the initdb and pg_upgrade commands again with the correct data directory (i.e. newest version 13 data dir) as a starting point. There should be postmaster.opts in each data directory that is changed whenever the postgresql server runs in the directory, so you can check if its timestamp matches the time you stopped postgresql.service during the upgrade.

:crossed_fingers:

kynikos commented 2 years ago

Thanks for your help! So I deleted /var/lib/postgres/data and completely restarted the upgrade process using the backup that I had made before attempting the upgrade the first time. This time I immediately used version 12 of postgresql-old-upgrade, and followed https://wiki.archlinux.org/title/PostgreSQL#pg_upgrade manually. The database upgrade seemed to complete successfully, and the postgres service restarted normally.

However when running statistics.py I still get that exception, this time with the updated message:

Traceback (most recent call last):
  File "/home/dario/dev/arch/wiki-scripts/statistics.py", line 311, in <module>
    sys.exit(statistics.run())
  File "/home/dario/dev/arch/wiki-scripts/statistics.py", line 98, in run
    self.db.sync_with_api(self.api)
  File "/home/dario/dev/arch/wiki-scripts/ws/db/database.py", line 148, in sync_with_api
    grabbers.synchronize(self, api, with_content=with_content, check_needs_update=check_needs_update)
  File "/home/dario/dev/arch/wiki-scripts/ws/db/grabbers/__init__.py", line 39, in synchronize
    GrabberRevisions(api, db, with_content=with_content).update()
  File "/home/dario/dev/arch/wiki-scripts/ws/db/grabbers/GrabberBase.py", line 129, in update
    self._execute(gen, sync_timestamp)
  File "/home/dario/dev/arch/wiki-scripts/ws/db/grabbers/GrabberBase.py", line 137, in _execute
    for item in gen:
  File "/home/dario/dev/arch/wiki-scripts/ws/db/grabbers/revision.py", line 475, in gen_update
    raise NotImplementedError(f"Cannot merge revisions from pageid {pageid} to ns={dest_ns}: '{dest_title}': target page has been moved.")
NotImplementedError: Cannot merge revisions from pageid 28534 to ns=0: 'Dell Inspiron 7375': target page has been moved.

This is the output of the test script with the synchronization timestamps:

('GrabberNamespaces', datetime.datetime(2022, 4, 30, 20, 5, 7, 142313))
('GrabberTags', datetime.datetime(2022, 4, 30, 20, 5, 7, 677364))
('GrabberRecentChanges', datetime.datetime(2022, 4, 30, 20, 5, 7, 964002))
('GrabberUsers', datetime.datetime(2022, 4, 30, 20, 5, 8, 766036))
('GrabberLogging', datetime.datetime(2022, 4, 30, 20, 5, 9, 35970))
('GrabberUserMerge', datetime.datetime(2022, 4, 30, 20, 5, 9, 305387))
('GrabberInterwiki', datetime.datetime(2022, 4, 30, 20, 5, 9, 309293))
('GrabberIPBlocks', datetime.datetime(2022, 4, 30, 20, 5, 9, 312769))
('GrabberPages', datetime.datetime(2022, 4, 30, 20, 5, 9, 578400))
('GrabberProtectedTitles', datetime.datetime(2022, 4, 30, 20, 5, 9, 871169))
('GrabberRevisions', datetime.datetime(2020, 12, 12, 4, 31, 28, 727373))

What do you think?

lahwaacz commented 2 years ago

It would have been better if you had a backup of the version 13 just before this upgrade (or even a month or two old backup would work too). Now the script needs to sync revisions starting from 2020 and fails because some edge case is not implemented (as you see from the message, the mediawiki "merge" events are the problem). It happens when syncing over long time periods as that increases the probability that "incompatible" merge and move events happened on the wiki. Unfortunately, the only solution I know of is to sync often. Restarting the sync from scratch (i.e. 2000) would fix it too, but obviously that would take a long time... I can also send you my database dump to make it faster.

kynikos commented 2 years ago

It's so weird, it's as if postgresql never asked me to upgrade the database from 12 to 13, because the backup that I used to perform this v14 upgrade was the one that I'd made right before attempting it the other day 🤷‍♂️ And I don't understand why the revisions hadn't synced since 2020...

Still, you reminded me that I do include a wiki-scripts dump into my automated weekly backup, using something like pg_dump -U postgres wiki_scripts_archwiki > "db.dump.sql", so I've restored one from 15/4 with:

# dropdb wiki_scripts_archwiki
# createdb wiki_scripts_archwiki
# psql -U postgres -d wiki_scripts_archwiki -1 -f db.dump.sql

Then I tried to run statistics.py again but the page had already been saved this UTC day, so I've tried python statistics.py -pf and it ran without errors. The timestamps script now returns:

('GrabberNamespaces', datetime.datetime(2022, 5, 1, 11, 23, 42, 53508))
('GrabberTags', datetime.datetime(2022, 5, 1, 11, 23, 42, 588117))
('GrabberRecentChanges', datetime.datetime(2022, 5, 1, 11, 23, 42, 877392))
('GrabberUsers', datetime.datetime(2022, 5, 1, 11, 23, 49, 851198))
('GrabberLogging', datetime.datetime(2022, 5, 1, 11, 23, 50, 421501))
('GrabberUserMerge', datetime.datetime(2022, 5, 1, 11, 23, 51, 645463))
('GrabberInterwiki', datetime.datetime(2022, 5, 1, 11, 23, 51, 650545))
('GrabberIPBlocks', datetime.datetime(2022, 5, 1, 11, 23, 51, 663166))
('GrabberPages', datetime.datetime(2022, 5, 1, 11, 23, 51, 932389))
('GrabberProtectedTitles', datetime.datetime(2022, 5, 1, 11, 23, 54, 165347))
('GrabberRevisions', datetime.datetime(2022, 5, 1, 11, 23, 54, 520396))

So everything seems back to normal 🥳️ Thanks again for your help!

lahwaacz commented 2 years ago

That looks good, I'm glad you got it working again.