spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.81k stars 1.04k forks source link

Automated Database Upgrade Failed - Left in State Reporting: `there is no unique constraint matching given keys for referenced table "rooms"` #2239

Closed engineerjoe440 closed 1 year ago

engineerjoe440 commented 1 year ago

Playbook Configuration:

My `vars.yml` file looks like this: ```yaml --- # The bare domain name which represents your Matrix identity. # Matrix user ids for your server will be of the form (`@user:`). # # Note: this playbook does not touch the server referenced here. # Installation happens on another server ("matrix."). # # If you've deployed using the wrong domain, you'll have to run the Uninstalling step, # because you can't change the Domain after deployment. # # Example value: example.com matrix_domain: stanleysolutionsnw.com # The Matrix homeserver software to install. # See `roles/matrix-base/defaults/main.yml` for valid options. matrix_homeserver_implementation: synapse # Disable Managed NGINX Config # matrix_nginx_proxy_enabled: false # A secret used as a base, for generating various other secrets. # You can put any string here, but generating a strong one is preferred (e.g. `pwgen -s 64 1`). matrix_homeserver_generic_secret_key: '' # This is something which is provided to Let's Encrypt when retrieving SSL certificates for domains. # # In case SSL renewal fails at some point, you'll also get an email notification there. # # If you decide to use another method for managing SSL certificates (different than the default Let's Encrypt), # you won't be required to define this variable (see `docs/configuring-playbook-ssl-certificates.md`). # # Example value: someone@example.com # matrix_ssl_retrieval_method: none matrix_ssl_lets_encrypt_support_email: 'engineerjoe440@yahoo.com' # A Postgres password to use for the superuser Postgres user (called `matrix` by default). # # The playbook creates additional Postgres users and databases (one for each enabled service) # using this superuser account. matrix_postgres_connection_password: '' # Matrix SMTP/Email Configuration matrix_mailer_relay_use: true matrix_mailer_relay_host_name: "smtp-relay.sendinblue.com" matrix_mailer_relay_host_port: 587 matrix_mailer_relay_auth: true matrix_mailer_relay_auth_username: "engineerjoe440@gmail.com" matrix_mailer_relay_auth_password: "" #=============================================================================================================== matrix_synapse_ext_password_provider_shared_secret_auth_enabled: true matrix_synapse_ext_password_provider_shared_secret_auth_shared_secret: #=============================================================================================================== # mautrix-facebook is a Matrix <-> Facebook bridge # See: https://github.com/mautrix/facebook matrix_mautrix_facebook_enabled: true # mautrix-signal is a Matrix <-> Signal bridge matrix_mautrix_signal_enabled: true matrix_mautrix_signal_relaybot_enabled: true # mx-puppet-groupme is a Matrix <-> GroupMe bridge matrix_mx_puppet_groupme_enabled: true ```

Matrix Server:

Ansible Version:

root@matrix:~/stanleysolutionsmatrix# ansible --version
ansible [core 2.12.2]
  config file = /root/stanleysolutionsmatrix/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.8/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0]
  jinja version = 2.10.1
  libyaml = True

Problem description:

When performing an upgrade with the command below, I noticed that there was a suggestion that (while not required) a database upgrade was available. I decided to perform said upgrade with the database upgrade command (below), following which, I was unable to restart the synapse server.

Upgrade Command:

ansible-playbook -i inventory/hosts setup.yml --tags=setup-all --ask-pass

Database Upgrade Command:

ansible-playbook -i inventory/hosts setup.yml --tags=upgrade-postgres --ask-pass

Following the upgrade, the Synapse server would not restart (that is, when trying to start, it would fail). This was evident when reviewing journal logs using the command journalctl -fu matrix-synapse.service and seeing the traceback message below.

Traceback Contained in `journalctl` Logs ```bash Nov 06 03:06:14 matrix.stanleysolutionsnw.com systemd[1]: Starting Synapse server... Nov 06 03:06:14 matrix.stanleysolutionsnw.com systemd[1]: Started Synapse server. Nov 06 03:06:15 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: Starting synapse with args -m synapse.app.homeserver -c /data/homeserver.yaml Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: This server is configured to use 'matrix.org' as its trusted key server via the Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: 'trusted_key_servers' config option. 'matrix.org' is a good choice for a key Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: server since it is long-lived, stable and trusted. However, some admins may Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: wish to use another server for this purpose. Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: To suppress this warning and continue using 'matrix.org', admins should set Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: 'suppress_key_server_warning' to 'true' in homeserver.yaml. Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: -------------------------------------------------------------------------------- Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: 2022-11-06 03:06:19,541 - root - 343 - WARNING - main - ***** STARTING SERVER ***** Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: 2022-11-06 03:06:19,542 - root - 344 - WARNING - main - Server /usr/local/lib/python3.9/site-packages/synapse/app/homeserver.py version 1.69.0 Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: 2022-11-06 03:06:19,586 - synapse.app._base - 205 - ERROR - main - Exception during startup Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: Traceback (most recent call last): Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/app/homeserver.py", line 375, in setup Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: hs.setup() Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/server.py", line 309, in setup Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: self.datastores = Databases(self.DATASTORE_CLASS, self) Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/storage/databases/__init__.py", line 74, in __init__ Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: prepare_database( Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/storage/prepare_database.py", line 136, in prepare_database Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: _upgrade_existing_database( Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/storage/prepare_database.py", line 520, in _upgrade_existing_database Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: database_engine.execute_script_file(cur, absolute_path) Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/storage/engines/_base.py", line 145, in execute_script_file Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: cls.executescript(cursor, f.read()) Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/storage/engines/postgres.py", line 224, in executescript Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: cursor.execute(script) Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 388, in execute Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: self._do_execute(self.txn.execute, sql, *args) Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 436, in _do_execute Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: return func(sql, *args, **kwargs) Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: psycopg2.errors.InvalidForeignKey: there is no unique constraint matching given keys for referenced table "rooms" Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: ********************************************************************************** Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: Error during initialisation: Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: there is no unique constraint matching given keys for referenced table "rooms" Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: There may be more information in the logs. Nov 06 03:06:19 matrix.stanleysolutionsnw.com matrix-synapse[3435881]: ********************************************************************************** Nov 06 03:06:21 matrix.stanleysolutionsnw.com systemd[1]: matrix-synapse.service: Main process exited, code=exited, status=1/FAILURE Nov 06 03:06:21 matrix.stanleysolutionsnw.com systemd[1]: matrix-synapse.service: Failed with result 'exit-code'. ```

At this point, I believe that corrective actions need to be taken, manually, in the database. I'm unfamiliar with what these commands should be and am looking for guidance and/or suggestions for additional troubleshooting. Thank you so much for your assistance!

spantaleev commented 1 year ago

You seem to have missed the part in Upgrading PostgreSQL which tells you that:

The old Postgres data directory is backed up automatically, by renaming it to /matrix/postgres/data-auto-upgrade-backup. To rename to a different path, pass some extra flags to the command above, like this: --extra-vars="postgres_auto_upgrade_backup_data_path=/another/disk/matrix-postgres-before-upgrade"

The auto-upgrade-backup directory stays around forever, until you manually decide to delete it.


You've also missed the part where the --tags=upgrade-postgres itself tells you about the same. It also tells you how to recover in the event of failure:

https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/a19f239f06d9942b69ad18c23eb39da2fb1d0890/roles/custom/matrix-postgres/tasks/upgrade_postgres.yml#L112-L113

NOTE: Your Postgres data directory has been moved from /matrix/postgres/data to /matrix/postgres/data-auto-upgrade-backup. In the event of failure, you can move it back and run the playbook with --tags=setup-postgres to restore operation.


You've also missed the part where it told you about recovery one more time:

https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/a19f239f06d9942b69ad18c23eb39da2fb1d0890/roles/custom/matrix-postgres/tasks/upgrade_postgres.yml#L159-L167

.. which would have looked something like this.

Importing Postgres database using the following command: ..... If this crashes, you can stop Postgres (systemctl stop matrix-postgres), delete the new database data (rm -rf /matrix/postgres/data) and restore the automatically-made backup (mv /matrix/postgres/data-auto-upgrade-backup /matrix/postgres/data).


Then, you should figure out why the upgrade failed. Perhaps running --tags=upgrade-postgres with an extra -vvv flag would have been helpful.

If upgrading fails, you should restore as mentioned above multiple times, instead of trying to launch Synapse with a broken database and expect it to magically work.

engineerjoe440 commented 1 year ago

Drat! Yes! I must've missed those. Part of it was probably that my terminal session didn't retain all of the messages. I'm afraid what I might have seen probably didn't feel all that clear when I read it, either. I will say that those messages do give the inclination that the database backup-restore makes sense, but I would argue that it's probably still not clear that in this case restoring the database is the best course of action. After all, the database comes up and runs. From reading what log information is available, it looks like Synapse is having trouble with the new database format. I think my point here is that although it seems there is enough information to describe a database restoration, it seems there's a missing connection to illustrate it as the "next step" to follow when an issue like the one I ran into arises.

Well thank you for your suggestions! Hopefully this issue will act as an "archive" for the next "poor soul" who foolishly runs into this like I did.

Much appreciated, @spantaleev. I'll post back with an update on my progress shortly.

engineerjoe440 commented 1 year ago

Well... Looks like I might still be out of luck. After restoring that backup, things still aren't lining up quite in the way I'd hope.

TASK [matrix-common-after : Fail if service isn't detected to be running] *****************************************************************************************************************************************
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-mailer.service) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-postgres.service) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-mautrix-facebook.service) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-mautrix-signal.service) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-mautrix-signal-daemon.service) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-mx-puppet-groupme.service) 
failed: [matrix.stanleysolutionsnw.com] (item=matrix-synapse.service) => changed=false 
  ansible_loop_var: item
  item: matrix-synapse.service
  msg: matrix-synapse.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status matrix-synapse.service` and `journalctl -fu matrix-synapse.service` on the server to investigate. If you're on a slow or overloaded server, it may be that services take a longer time to start and that this error is a false-positive. You can consider raising the value of the `matrix_common_after_systemd_service_start_wait_for_timeout_seconds` variable. See `roles/matrix-common-after/defaults/main.yml` for more details about that.
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-client-element.service) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-nginx-proxy.service) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-ssl-lets-encrypt-certificates-renew.timer) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-ssl-nginx-proxy-reload.timer) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-coturn.service) 
skipping: [matrix.stanleysolutionsnw.com] => (item=matrix-coturn-reload.timer) 

PLAY RECAP ********************************************************************************************************************************************************************************************************
matrix.stanleysolutionsnw.com : ok=33   changed=2    unreachable=0    failed=1    skipped=198  rescued=0    ignored=0

Any thoughts on manual corrective actions for the database, itself? Or is it time to throw up my arms, say: "I've learned something; now it's time to move on and start over."

engineerjoe440 commented 1 year ago

I want to reaffirm, I'm almost certain this is user-error (that is, it's my fault), but I'm hopeful we can uncover some way to help improve the project so other "dummies like me" won't run into similar issues.

spantaleev commented 1 year ago

Did you restore like this?

  1. Stop all services: --tags=stop -- the instructions say you should stop Postgres only, which is fair if you're restoring immediately after you noticed the failure to import, but since you've started Synapse, etc., you'd better stop everything
  2. Remove the new (corrupt) database: rm -rf /matrix/postgres/data
  3. Move the old data: mv /matrix/postgres/data-auto-upgrade-backup /matrix/postgres/data
  4. Re-run the playbook: --tags=setup-all,start would be best

I suppose you've just restored and are trying to run services as they were before? And you see that Synapse is failing? Have you tried checking the logs for Synapse like it tells you to?

engineerjoe440 commented 1 year ago

Hi @spantaleev! Forgive my delayed resposne.

No, I guess I hadn't tried the restore right-away; it had already been about a week between realizing there was a problem, and getting a chance to dig in more. I'm certain I've missed steps along the way, and I'm even more certain that I haven't done the best job of explaining my situation.

I hope that I'm improperly infering some animosity in your comments. Your comments seems a bit tense with the hint of a demeaning attitude, though I hope that is not your intent. I do not mean to accuse you of this; but rather, observe that they can be perceived as such. I hope to leave this thread here, on a more positive note. I greatly appreciate the time you've spent to provide support, for a case which provides you no immediate benefit, with no promise of future benefit, either. Your work and effort here does not go unnoticed. Thank you! :tada: :heart:

I have decided that the effort is likely greater to attempt recovery than anything else, and it seems that you would likely agree at this point. It doesn't seem that there are any other manual recovery techniques that you might suggest to fix my self-made-mess. So at this point, I think I'm going to "bootstrap my way back up." Luckily, it was only myself that will have to deal with any repercussions.

Thank you again, for your efforts here! It's thanks to your work that I've been able to bring so many of my chats into one place without the hedaches of doing additional research on my own. Thank you!