spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.88k stars 1.04k forks source link

Switching to Mautrix Megabridge breaks some bridges #3538

Closed RoiArthurB closed 1 month ago

RoiArthurB commented 1 month ago

Describe the bug

Since I moved from classical shared secret auth to the new Appservice Double Puppet, some of my mautrix bridges breaks. It can be either one which was working great before (like Slack) or a newly installed one (only with the appservice installed, not before) (like Gmessages).

Finally (and even more surprisingly), some other services doesn't suffer any issue at all (like WhatsApp, Discord, else).

To Reproduce My vars.yml file looks like this:

matrix_architecture: "arm64"

[...]

# A secret used to protect access keys issued by the server.
matrix_synapse_macaroon_secret_key: '[redacted]'
matrix_homeserver_generic_secret_key: "{{ matrix_synapse_macaroon_secret_key }}"

matrix_synapse_ext_password_provider_shared_secret_auth_enabled: false
#matrix_synapse_ext_password_provider_shared_secret_auth_shared_secret: '[redacted]'

[...]

matrix_playbook_reverse_proxy_type: playbook-managed-traefik
# Ensure that public urls use https
matrix_playbook_ssl_enabled: true
# Disable the web-secure (port 443) endpoint, which also disables SSL certificate retrieval
devture_traefik_config_entrypoint_web_secure_enabled: false
devture_traefik_container_web_host_bind_port: '10.10.10.13:81'
# We bind to `127.0.0.1` by default (see above), so trusting `X-Forwarded-*` headers from
# a reverse-proxy running on the local machine is safe enough.
devture_traefik_config_entrypoint_web_forwardedHeaders_insecure: true
# Uncomment and tweak the variable below if the name of your federation entrypoint is different
# than the default value (matrix-federation).
matrix_federation_traefik_entrypoint_name: matrix-federation

# Uncomment and tweak the variable below if you really wish to change the internal port number
# that the federation endpoint uses. Changing it is generally not necessary.
# Usually, changing `matrix_playbook_public_matrix_federation_api_traefik_entrypoint_host_bind_port` below is enough.
matrix_playbook_public_matrix_federation_api_traefik_entrypoint_port: 8448

matrix_playbook_public_matrix_federation_api_traefik_entrypoint_host_bind_port: 0.0.0.0:8448

[...]

matrix_appservice_double_puppet_enabled: true

# @googlechatbot:YOUR_DOMAIN
matrix_mautrix_googlechat_enabled: true
matrix_mautrix_googlechat_federate_rooms: false
matrix_mautrix_googlechat_configuration_extension_yaml: |
  bridge:
    displayname_template: '{full_name} (GC)'

# @gmessagesbot:YOUR_DOMAIN
matrix_mautrix_gmessages_enabled: true
matrix_mautrix_gmessages_configuration_extension_yaml: |
  bridge:
    displayname_template: "{{ '{{or .FullName .PhoneNumber}}' }} (RCS)"

# @whatsappbot:YOUR_DOMAIN
matrix_mautrix_whatsapp_enabled: true
matrix_mautrix_whatsapp_federate_rooms: false
matrix_mautrix_whatsapp_bridge_mute_bridging: false
matrix_mautrix_whatsapp_configuration_extension_yaml: |
  bridge:
    history_sync:
      backfill: false
    permissions:
      '@roiarthurb:roiarthurb.xyz': admin

# @slackbot:YOUR_DOMAIN
matrix_mautrix_slack_enabled: true
#matrix_mautrix_slack_federate_rooms: false

[...]

Expected behavior Have every mautrix services working right out of the box as supposed to.

Matrix Server:

Additional context Add any other context about the problem here.

Logs :

$ sudo journalctl -fu matrix-mautrix-gmessages.service
Sep 18 13:15:04 DietPi systemd[1]: matrix-mautrix-gmessages.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 13:15:04 DietPi systemd[1]: matrix-mautrix-gmessages.service: Failed with result 'exit-code'.
Sep 18 13:15:34 DietPi systemd[1]: matrix-mautrix-gmessages.service: Scheduled restart job, restart counter is at 111.
Sep 18 13:15:34 DietPi systemd[1]: Stopped matrix-mautrix-gmessages.service - Matrix Mautrix gmessages bridge.
Sep 18 13:15:34 DietPi systemd[1]: Starting matrix-mautrix-gmessages.service - Matrix Mautrix gmessages bridge...
Sep 18 13:15:34 DietPi matrix-mautrix-gmessages[826717]: bcfcb5b5a336466014c7ef94f29cef10daefb50213bb5615c804677b84a4da4d
Sep 18 13:15:35 DietPi systemd[1]: Started matrix-mautrix-gmessages.service - Matrix Mautrix gmessages bridge.
Sep 18 13:15:36 DietPi matrix-mautrix-gmessages[826744]: 2024-09-18T06:15:36.551Z FTL Failed to start bridge error="failed to start Matrix connector: the supplied account key is invalid"
Sep 18 13:15:37 DietPi systemd[1]: matrix-mautrix-gmessages.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 13:15:37 DietPi systemd[1]: matrix-mautrix-gmessages.service: Failed with result 'exit-code'.
$ sudo journalctl -fu matrix-mautrix-slack.service
Sep 18 13:27:54 DietPi systemd[1]: Starting matrix-mautrix-slack.service - Matrix Mautrix Slack bridge...
Sep 18 13:27:54 DietPi matrix-mautrix-slack[838608]: 8b13bdf4addf258b50d0cbb9c19a0f08fde2f315d42af1ca26b0231be7cb4579
Sep 18 13:27:55 DietPi systemd[1]: Started matrix-mautrix-slack.service - Matrix Mautrix Slack bridge.
Sep 18 13:27:56 DietPi matrix-mautrix-slack[838651]: 2024-09-18T06:27:56.595Z FTL Failed to start bridge error="failed to start Matrix connector: the supplied account key is invalid"
Sep 18 13:27:57 DietPi systemd[1]: matrix-mautrix-slack.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 13:27:57 DietPi systemd[1]: matrix-mautrix-slack.service: Failed with result 'exit-code'.
TASK [galaxy/systemd_service_manager : Fail if service isn't detected to be running] ********************************
skipping: [REDACTED] => (item=matrix-container-socket-proxy.service) 
skipping: [REDACTED] => (item=matrix-traefik.service) 
skipping: [REDACTED] => (item=matrix-postgres.service) 
skipping: [REDACTED] => (item=matrix-exim-relay.service) 
skipping: [REDACTED] => (item=matrix-coturn.service) 
skipping: [REDACTED] => (item=matrix-synapse.service) 
skipping: [REDACTED] => (item=matrix-sliding-sync.service) 
skipping: [REDACTED] => (item=matrix-beeper-linkedin.service) 
skipping: [REDACTED] => (item=matrix-client-element.service) 
skipping: [REDACTED] => (item=matrix-hookshot.service) 
skipping: [REDACTED] => (item=matrix-mautrix-discord.service) 
failed: [REDACTED] (item=matrix-mautrix-gmessages.service) => changed=false 
  ansible_loop_var: item
  item: matrix-mautrix-gmessages.service
  msg: matrix-mautrix-gmessages.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status matrix-mautrix-gmessages.service` and `journalctl -fu matrix-mautrix-gmessages.service` on the server to investigate. If you're on a slow or overloaded server, it may be that services take a longer time to start and that this error is a false-positive. You can consider raising the value of the `devture_systemd_service_manager_up_verification_delay_seconds` variable. See `/home/roiarthurb/Documents/Dev/ansible/matrix-docker-ansible-deploy/roles/galaxy/systemd_service_manager/defaults/main.yml` for more details about that.
skipping: [REDACTED] => (item=matrix-mautrix-googlechat.service) 
skipping: [REDACTED] => (item=matrix-mautrix-meta-messenger.service) 
failed: [REDACTED] (item=matrix-mautrix-slack.service) => changed=false 
  ansible_loop_var: item
  item: matrix-mautrix-slack.service
  msg: matrix-mautrix-slack.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status matrix-mautrix-slack.service` and `journalctl -fu matrix-mautrix-slack.service` on the server to investigate. If you're on a slow or overloaded server, it may be that services take a longer time to start and that this error is a false-positive. You can consider raising the value of the `devture_systemd_service_manager_up_verification_delay_seconds` variable. See `/home/roiarthurb/Documents/Dev/ansible/matrix-docker-ansible-deploy/roles/galaxy/systemd_service_manager/defaults/main.yml` for more details about that.
skipping: [REDACTED] => (item=matrix-mautrix-telegram.service) 
skipping: [REDACTED] => (item=matrix-mautrix-whatsapp.service) 
skipping: [REDACTED] => (item=matrix-bot-matrix-reminder-bot.service) 
skipping: [REDACTED] => (item=matrix-bot-postmoogle.service) 
skipping: [REDACTED] => (item=matrix-static-files.service) 
skipping: [REDACTED] => (item=matrix-synapse-admin.service) 
skipping: [REDACTED] => (item=matrix-coturn-reload.timer) 
skipping: [REDACTED] => (item=matrix-synapse-auto-compressor.timer) 
skipping: [REDACTED] => (item=matrix-synapse-s3-storage-provider-migrate.timer) 
spantaleev commented 1 month ago

Your vars.yml does not indicate that you're enabling encryption for any of these bridges.

The error message seems to be this, which is somehow related to encryption.

I wonder if our pickle_key configuration is incorrect. You can try overriding encryption.pickle_key via *_extension_yaml variables.


For testing purposes, I just did a new installation of the Gmessages bridge and it starts successfully. Maybe when it's powered by an existing database (with existing encrypted messages?) that it suffers some problems.

RoiArthurB commented 1 month ago

Hi, you're right I forgot to show that I enabled mautrix encryption with this parameter :

matrix_bridges_encryption_enabled: true
matrix_bridges_encryption_default: true

So I can understand why it's failing over Slack, but on a brand new bridge (which is the case for GMessages) it shouldn't be a problem...

I wonder if our pickle_key configuration is incorrect. You can try overriding encryption.pickle_key via *_extension_yaml variables.

I'd be happy to try anything (and I don't mind resetting my bridges), but I'll need you to help me with that πŸ˜…

From what I can see in the files, the slack bridge is the only with a different value than every other bridges for the pickle_key... :thinking:

./matrix-docker-ansible-deploy/roles/custom/matrix-bridge-mautrix-gmessages/templates/config.yaml.j2:
  360      # Pickle key for encrypting encryption keys in the bridge database.
  361      # If set to generate, a random key will be generated.
  362:     pickle_key: mautrix.bridge.e2ee
  363      # Options for deleting megolm sessions from the bridge.
  364      delete_keys:

./matrix-docker-ansible-deploy/roles/custom/matrix-bridge-mautrix-slack/templates/config.yaml.j2:
  377      # Pickle key for encrypting encryption keys in the bridge database.
  378      # If set to generate, a random key will be generated.
  379:     pickle_key: generate
  380      # Options for deleting megolm sessions from the bridge.
  381      delete_keys:
spantaleev commented 1 month ago

It should be as simple as this:

matrix_mautrix_gmessages_configuration_extension_yaml: |
  bridge:
    displayname_template: "{{ '{{or .FullName .PhoneNumber}}' }} (RCS)"
  encryption:
    pickle_key: some value

Yes, the Slack bridge uses a value of generate, while the Gmessages bridge uses mautrix.bridge.e2ee (not sure where this came from).

It seems like both of these may be problematic. You can try "".


I've enabled encryption for my Gmessages bridge like this:

matrix_mautrix_gmessages_bridge_encryption_allow: true
matrix_mautrix_gmessages_bridge_encryption_default: true

... and it still managed to start successfully with its default pickle_key value of mautrix.bridge.e2ee.

So maybe it's existing installations (that had a different pickle key, historically) that suffer this problem.

RoiArthurB commented 1 month ago

It should be as simple as this:

matrix_mautrix_gmessages_configuration_extension_yaml: |
  bridge:
    displayname_template: "{{ '{{or .FullName .PhoneNumber}}' }} (RCS)"
  encryption:
    pickle_key: some value

I did set some random values, but nothing made it work...

I also try to disable the encryption for this bridge, disable it (then ran setup-gmessages) and re-enable it; but nothing seems to work for me... I should do something wrong somewhere :/


I've enabled encryption for my Gmessages bridge like this:

matrix_mautrix_gmessages_bridge_encryption_allow: true
matrix_mautrix_gmessages_bridge_encryption_default: true

Can you help me to fully reset the bridge to try to fix this issue and potential conflicting values with previously set key ? πŸ˜…

RoiArthurB commented 1 month ago

I know it's not supposed to, but do you think that using the playbook in the old manner might lead to this issue ?

I'm running it with this command : make roles && ansible-playbook -i inventory/hosts setup.yml --tags=setup-all,start (or tweaking the tags)

spantaleev commented 1 month ago

To fully reinstall a component:

  1. Disable it (set the *_enabled: variable to false)
  2. Re-run the playbook (just setup-all or what you're quoting above)
  3. Drop the component's database:
    • run /matrix/postgres/bin/cli on the server
    • list databases with \l
    • drop the component's database (e.g. DROP DATABASE some_database_name;)
      1. Consider deleting some leftover files (if any): rm -rf /matrix/some-component-directory
      2. Re-enable the component (set the *_enabled: variable to true)
      3. Re-run the playbook (just setup-all or what you're quoting above). You may even do just install-all, which is quicker
RoiArthurB commented 1 month ago

Hi @spantaleev,

Thanks a lot for all your support. I did successfully fully reset my faulty bridges and now everything works just fine again. I probably did some bad things one way or another, but this radical solution been my simpliest fix.

Thanks also for all your great work on this project πŸ™

xangelix commented 1 month ago

Adding the gmessage pickle_key setting fixed this issue for me like so:

matrix_mautrix_gmessages_configuration_extension_yaml: |
  encryption:
    pickle_key: "go.mau.fi/mautrix-gmessages"

The value was pulled from: /matrix/mautrix-gmessages/docker-src/cmd/mautrix-gmessages/legacymigrate.go

Running just install-matrix-bridge-mautrix-gmessages,start was not sufficient, I had to run with setup-all,start.

spantaleev commented 1 month ago

Thanks for figuring it out, @xangelix!

I've added a dedicated variable to the Gmessages role (matrix_mautrix_gmessages_bridge_encryption_pickle_key), which lets you override it easier (without having to resort to matrix_mautrix_gmessages_configuration_extension_yaml).

Only users of the previous Gmessages bridge should be affected by this and will need to adjust the pickle_key. For new installations, we're using a pickle_key of mautrix.bridge.e2ee.


To summarize, if you've been using the old Gmessages bridge with encryption and you're finding that the new bridge fails for you, consider adding this additional configuration to your vars.yml file:

matrix_mautrix_gmessages_bridge_encryption_pickle_key: go.mau.fi/mautrix-gmessages