spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.66k stars 1.01k forks source link

Error during MatrixClient request GET /_matrix/client/r0/joined_rooms: 401 Unauthorized #2243

Open nopeitsnothing opened 1 year ago

nopeitsnothing commented 1 year ago

Playbook Configuration:

My vars.yml file looks like this:

---
# The bare domain name which represents your Matrix identity.
# Matrix user ids for your server will be of the form (`@user:<matrix-domain>`).
#
matrix_domain: <redacted>

# The Matrix homeserver software to install.
# See `roles/matrix-base/defaults/main.yml` for valid options.
matrix_homeserver_implementation: synapse

# A secret used as a base, for generating various other secrets.
# You can put any string here, but generating a strong one is preferred (e.g. `pwgen -s 64 1`).
matrix_homeserver_generic_secret_key: <redacted>

# This is something which is provided to Let's Encrypt when retrieving SSL certificates for domains.
#
# In case SSL renewal fails at some point, you'll also get an email notification there.
#
# If you decide to use another method for managing SSL certificates (different than the default Let's Encrypt),
# you won't be required to define this variable (see `docs/configuring-playbook-ssl-certificates.md`).
#
# Example value: someone@example.com
matrix_ssl_lets_encrypt_support_email: 'contact@<redacted>'

# A Postgres password to use for the superuser Postgres user (called `matrix` by default).
#
# The playbook creates additional Postgres users and databases (one for each enabled service)
# using this superuser account.
matrix_postgres_connection_password: '<redacted>'
matrix_client_element_enabled: false

# Optimizations for small VPS
# An identity server is not a must.
matrix_ma1sd_enabled: false

# Disabling this will prevent email-notifications and other such things from working.
matrix_mailer_enabled: false

# You can also disable this to save more RAM,
# at the expense of audio/video calls being unreliable.
matrix_coturn_enabled: false

# Amount of time to check if everything is okay when starting services
matrix_common_after_systemd_service_start_wait_for_timeout_seconds: 60

# Nginx optimizations
matrix_nginx_proxy_proxy_matrix_client_api_forwarded_location_synapse_admin_api_enabled: true
matrix_nginx_proxy_proxy_matrix_nginx_status_enabled: false
matrix_nginx_proxy_access_log_enabled: false
# This makes Synapse not keep track of who is online/offline.
#
# Keeping track of this and announcing such online-status in federated rooms with
# hundreds of servers inside is insanely heavy (https://github.com/matrix-org/synapse/issues/3971).
#
# If your server does not federate with hundreds of others, enabling this doesn't hurt much.
matrix_synapse_presence_enabled: true
matrix_synapse_log_level: "CRITICAL"
matrix_synapse_storage_sql_log_level: "CRITICAL"
matrix_synapse_root_log_level: "CRITICAL"
matrix_synapse_caches_global_factor: 2.0

matrix_synapse_configuration_extension_yaml: |
  matrix_synapse_retention:
    enabled: true
    default_policy:
      min_lifetime: 1d
      max_lifetime: 2d
    allowed_lifetime_min: 1d
    allowed_lifetime_max: 2d
    purge_jobs:
     - longest_max_lifetime: 1d
       interval: 1d
     - shortest_max_lifetime: 7d
       interval: 1d

  # Resource-constrained homeserver settings
  #
  # When this is enabled, the room "complexity" will be checked before a user
  # joins a new remote room. If it is above the complexity limit, the server will
  # disallow joining, or will instantly leave.
  #
  # Room complexity is an arbitrary measure based on factors such as the number of
  # users in the room.
  #
  limit_remote_rooms:
  # Uncomment to enable room complexity checking.
  #
    enabled: true

  # the limit above which rooms cannot be joined. The default is 1.0.
  #
    complexity: 10

  # override the error which is returned when the room is too complex.
  #
    complexity_error: "This room is too complex."

  # allow server admins to join complex rooms. Default is false.
  #
  # admins_can_join: true

#  postgres optimizations

matrix_postgres_process_extra_arguments: [
  "-c shared_buffers=512MB",
  "-c effective_cache_size=1536MB",
  "-c effective_io_concurrency=200",
  "-c random_page_cost=1.1",
  "-c min_wal_size=1GB",
  "-c maintenance_work_mem=128MB",
  "-c checkpoint_completion_target=0.9",
  "-c wal_buffers=16MB",
  "-c default_statistics_target=100",
  "-c work_mem=1310kB",
  "-c max_wal_size=4GB",
  ]

# mjolnir

matrix_bot_mjolnir_enabled: true
matrix_bot_mjolnir_access_token: "syt_redacted"
matrix_bot_mjolnir_management_room: "<redacted>:matrix.org"
matrix_bot_mjolnir_configuration_extension_yaml: |
  # Your custom YAML configuration goes here.
  # This configuration extends the default starting configuration (`matrix_bot_mjolnir_configuration_yaml`).
  #
  # You can override individual variables from the default configuration, or introduce new ones.
  #
  # If you need something more special, you can take full control by
  # completely redefining `matrix_bot_mjolnir_configuration_yaml`.

  # homeserverUrl: http://matrix-synapse:8008
  # rawHomeServerUrl: http://matrix-synapse:8008

  # Misc options for command handling and commands
  commands:
  # If true, Mjolnir will respond to commands like !help and !ban instead of
  # requiring a prefix. This is useful if Mjolnir is the only bot running in
  # your management room.
  #
  # Note that Mjolnir can be pinged by display name instead of having to use
  # the !mjolnir prefix. For example, "my_moderator_bot: ban @spammer:example.org"
  # will ban a user.
    allowNoPrefix: true
  #
  # automatic redact on certain ban keywords
  automaticallyRedactForReasons:
    - "spam"
    - "scam"
    - "gore"
    - "ban evasion"
    - "illicit"
    - "hate speech"
    - "illegal"

  # show invite requests in management room
  recordIgnoredInvites: true

  # Options for exposing web APIs.
  # web:
    # Whether to enable web APIs.
    # enabled: true

    # The port to expose the webserver on. Defaults to 8080.
    # port: 8080

    # The address to listen for requests on. Defaults to only the current
    # computer.
    # address: localhost

    # Alternative setting to open to the entire web. Be careful,
    # as this will increase your security perimeter:
    #
    #  address: "0.0.0.0"

    # A web API designed to intercept Matrix API
    # POST /_matrix/client/r0/rooms/{roomId}/report/{eventId}
    # and display readable abuse reports in the moderation room.
    #
    # If you wish to take advantage of this feature, you will need
    # to configure a reverse proxy, see e.g. test/nginx.conf
    # abuseReporting:
    # Whether to enable this feature.
    #   enabled: true

    # Whether or not to actively poll synapse for abuse reports, to be used
    # instead of intercepting client calls to synapse's abuse endpoint, when that
    # isn't possible/practical.
    # pollReports: false

    # Whether or not new reports, received either by webapi or polling,
    # should be printed to our managementRoom.
    # displayReports: false

Matrix Server:

Ansible:

$ ansible --version
ansible [core 2.13.5]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/ansible/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ansible/.local/lib/python3.9/site-packages/ansible
  ansible collection location = /home/ansible/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/ansible/.local/bin/ansible
  python version = 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110]
  jinja version = 3.1.2
  libyaml = True

Problem description: Running the playbook with almost no additional customizations, I attempted to self-build the Postgres Docker image. Turns out, that's a terrible idea because now there are things going on, I don't even know what to make of it.

Additional context I attempted to self-build the Postgres image and borked everything. Now I have tried everything, including the removal of Docker itself to nuke the networks, so I could try to reset literally everything and start from scratch. Again, turns out you can't do that, and it's a bad time.

Journalctl

$ sudo journalctl -fu matrix-synapse.service
-- Journal begins at Mon 2022-11-07 13:11:49 GMT. --
Nov 08 11:55:00 [server] matrix-synapse[548642]: server since it is long-lived, stable and trusted. However, some admins may
Nov 08 11:55:00 [server] matrix-synapse[548642]: wish to use another server for this purpose.
Nov 08 11:55:00 [server] matrix-synapse[548642]: To suppress this warning and continue using 'matrix.org', admins should set
Nov 08 11:55:00 [server] matrix-synapse[548642]: 'suppress_key_server_warning' to 'true' in homeserver.yaml.
Nov 08 11:55:00 [server] matrix-synapse[548642]: --------------------------------------------------------------------------------
Nov 08 12:42:23 [server] systemd[1]: Stopping Synapse server...
Nov 08 12:42:23 [server] matrix-synapse[563902]: matrix-synapse
Nov 08 12:42:23 [server] systemd[1]: matrix-synapse.service: Main process exited, code=exited, status=137/n/a
Nov 08 12:42:23 [server] systemd[1]: matrix-synapse.service: Failed with result 'exit-code'.
Nov 08 12:42:23 [server] systemd[1]: Stopped Synapse server.

$ sudo journalctl -fu matrix-bot-mjolnir.service
-- Journal begins at Mon 2022-11-07 13:11:49 GMT. --
Nov 08 12:41:54 [server] matrix-bot-mjolnir[563716]: Tue, 08 Nov 2022 12:41:54 GMT [INFO] [index] Starting bot...
Nov 08 12:41:54 [server] matrix-bot-mjolnir[563716]: Tue, 08 Nov 2022 12:41:54 GMT [ERROR] [MatrixHttpClient (REQ-1)] [Error: Error during MatrixClient request GET /_matrix/client/r0/joined_rooms: 401 Unauthorized -- {"errcode":"M_UNKNOWN_TOKEN","error":"Invalid access token passed.","soft_logout":false}]
Nov 08 12:41:54 [server] matrix-bot-mjolnir[563716]: Failed to setup mjolnir from the config /data: Error: Error during MatrixClient request GET /_matrix/client/r0/joined_rooms: 401 Unauthorized -- {"errcode":"M_UNKNOWN_TOKEN","error":"Invalid access token passed.","soft_logout":false}
Nov 08 12:41:54 [server] matrix-bot-mjolnir[563716]: node:internal/process/promises:279
Nov 08 12:41:54 [server] matrix-bot-mjolnir[563716]:             triggerUncaughtException(err, true /* fromPromise */);
Nov 08 12:41:54 [server] matrix-bot-mjolnir[563716]:             ^
Nov 08 12:41:54 [server] matrix-bot-mjolnir[563716]: [Error: Error during MatrixClient request GET /_matrix/client/r0/joined_rooms: 401 Unauthorized -- {"errcode":"M_UNKNOWN_TOKEN","error":"Invalid access token passed.","soft_logout":false}]
Nov 08 12:41:54 [server] systemd[1]: matrix-bot-mjolnir.service: Main process exited, code=exited, status=1/FAILURE
Nov 08 12:41:54 [server] systemd[1]: matrix-bot-mjolnir.service: Failed with result 'exit-code'.
Nov 08 12:42:19 [server] systemd[1]: Stopped Matrix Mjolnir bot.

Nov 08 12:41:54 [server] matrix-bot-mjolnir[563716]: [Error: Error during MatrixClient request GET /_matrix/client/r0/joined_rooms: 401 Unauthorized -- {"errcode":"M_UNKNOWN_TOKEN","error":"Invalid access token passed.","soft_logout":false}]

Sounds very straightforward, but trust me, I have done the obvious of generating the token and putting it into the config. Doesn't work:

$ ansible-playbook -i inventory/hosts setup.yml --tags=setup-all,start -K
...
TASK [custom/matrix-common-after : Fail if service isn't detected to be running] ***********************************************************************************************************
skipping: [matrix.<DOMAIN>] => (item=matrix-postgres.service)
failed: [matrix.<DOMAIN>] (item=matrix-bot-mjolnir.service) => changed=false
  ansible_loop_var: item
  item: matrix-bot-mjolnir.service
  msg: matrix-bot-mjolnir.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status matrix-bot-mjolnir.service` and `journalctl -fu matrix-bot-mjolnir.service` on the server to investigate. If you're on a slow or overloaded server, it may be that services take a longer time to start and that this error is a false-positive. You can consider raising the value of the `matrix_common_after_systemd_service_start_wait_for_timeout_seconds` variable. See `roles/custom/matrix-common-after/defaults/main.yml` for more details about that.
skipping: [matrix.<DOMAIN>] => (item=matrix-synapse.service)
skipping: [matrix.<DOMAIN>] => (item=matrix-nginx-proxy.service)
skipping: [matrix.<DOMAIN>] => (item=matrix-ssl-lets-encrypt-certificates-renew.timer)
skipping: [matrix.<DOMAIN>] => (item=matrix-ssl-nginx-proxy-reload.timer)

PLAY RECAP *********************************************************************************************************************************************************************************
matrix.<DOMAIN> : ok=252  changed=6    unreachable=0    failed=1    skipped=2045 rescued=0    ignored=0
aaronraimist commented 1 year ago

I have done the obvious of generating the token and putting it into the config

How are you obtaining the token?

Unrelated, the config option for retention is just retention, not matrix_synapse_retention however I would suggest not enabling the experiential message retention feature. It has known bugs that can cause database corruption https://github.com/matrix-org/synapse/issues/13476 and it also saves hardly any disk space.

nopeitsnothing commented 1 year ago

Unrelated, the config option for retention is just retention, not matrix_synapse_retention however I would suggest not enabling the experiential message retention feature. It has known bugs that can cause database corruption matrix-org/synapse#13476 and it also saves hardly any disk space.

That was my exact goal, minimizing the database and lowering the retention threshold. Seems that must be what borked everything.

How are you obtaining the token?

I am obtaining the token through registering a new user on Synapse (using the playbook: --extra-vars='username=mjolnir.bot password=<redacted> admin=yes' --tags=register-user) and then using Element web to get a token then synadm with said token. The registered user is an admin in Postgres.

I should also add: the way it works is now by forcing me to use the second level domain (matrix.org) instead of the normal matrix.matrix.org. Not only that, but I can also log in fine through a client using the latter domain. There's no way to message others. Someone in chat suggested it has no delegation. I believe a previous database cannot be imported either, since trying that results in catastrophe as well.

Edit: I also did the register new user through playbook, and then tried using curl to get a token to plug into the config, but that didn't work.

aaronraimist commented 1 year ago

then using Element web to get a token

That's fine. Just make sure you don't log out. The error says the token is invalid which most likely means you logged out which invalidates the token.

synadm with said token

What are you doing with synadm?

I should also add: the way it works is now by forcing me to use the second level domain (matrix.org) instead of the normal matrix.matrix.org. Not only that, but I can also log in fine through a client using the latter domain

What is forcing you to use the second level domain? Element? Mjolnir?

I believe a previous database cannot be imported either, since trying that results in catastrophe as well

Are you migrating from a Synapse server installed outside the playbook? If so, you need to import that database first, before you enable Mjolnir or try to register any users. What is the catastrophe?

nopeitsnothing commented 1 year ago

That's fine. Just make sure you don't log out. The error says the token is invalid which most likely means you logged out which invalidates the token.

I was staying logged in while I used the admin token.

What are you doing with synadm?

I'm using synadm to create a secondary user to test the database works. It doesn't. The user is created but there is no delegation. The user can not message others, not even on the homeserver.

What is forcing you to use the second level domain? Element? Mjolnir?

I'm not sure.

Are you migrating from a Synapse server installed outside the playbook? If so, you need to import that database first, before you enable Mjolnir or try to register any users. What is the catastrophe?

I'm not migrating, I'm attempting to install Synapse on the server itself. I tried importing the old database, but it doesn't work.

Now, certbot is telling me I hit the limit and have to wait until tomorrow night to retry.

I have the old certificate, can I just plug that into the playbook instead of generating a new one?

Retrying:

TASK [custom/matrix-nginx-proxy : Attempt initial SSL certificate retrieval with standalone authenticator (directly)] **********************************************************************
fatal: [matrix.redacted.org]: FAILED! => changed=true
  cmd: /usr/bin/env docker run --rm --name=matrix-certbot --user=998:1001 --cap-drop=ALL -p 80:8080 --mount type=bind,src=/matrix/ssl/config,dst=/etc/letsencrypt --mount type=bind,src=/matrix/ssl/log,dst=/var/log/letsencrypt docker.io/certbot/certbot:amd64-v1.31.0 certonly --non-interactive --work-dir=/tmp --http-01-port 8080   --key-type rsa --standalone --preferred-challenges http --agree-tos --email=contact@redacted.org -d matrix.redacted.org
  delta: '0:00:03.390827'
  end: '2022-11-10 03:11:55.177469'
  msg: non-zero return code
  rc: 1
  start: '2022-11-10 03:11:51.786642'
  stderr: |-
    Saving debug log to /var/log/letsencrypt/letsencrypt.log
    An unexpected error occurred:
    Error creating new order :: too many certificates (5) already issued for this exact set of domains in the last 168 hours: matrix.redacted.org, retry after 2022-11-10T15:22:09Z: see https://letsencrypt.org/docs/duplicate-certificate-limit/
    Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.
  stderr_lines: <omitted>
  stdout: Requesting a certificate for matrix.redacted.org
  stdout_lines: <omitted>
...ignoring

TASK [custom/matrix-nginx-proxy : Attempt initial SSL certificate retrieval with standalone authenticator (via proxy)] *********************************************************************
fatal: [matrix.redacted.org]: FAILED! => changed=true
  cmd: /usr/bin/env docker run --rm --name=matrix-certbot --user=998:1001 --cap-drop=ALL -p 127.0.0.1:2402:8080 --network=matrix --mount type=bind,src=/matrix/ssl/config,dst=/etc/letsencrypt --mount type=bind,src=/matrix/ssl/log,dst=/var/log/letsencrypt docker.io/certbot/certbot:amd64-v1.31.0 certonly --non-interactive --work-dir=/tmp --http-01-port 8080   --key-type rsa --standalone --preferred-challenges http --agree-tos --email=contact@redacted.org -d matrix.redacted.org
  delta: '0:00:03.682674'
  end: '2022-11-10 03:11:59.210333'
  msg: non-zero return code
  rc: 1
  start: '2022-11-10 03:11:55.527659'
  stderr: |-
    Saving debug log to /var/log/letsencrypt/letsencrypt.log
    An unexpected error occurred:
    Error creating new order :: too many certificates (5) already issued for this exact set of domains in the last 168 hours: matrix.redacted.org, retry after 2022-11-10T15:22:09Z: see https://letsencrypt.org/docs/duplicate-certificate-limit/
    Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.
  stderr_lines: <omitted>
  stdout: Requesting a certificate for matrix.redacted.org
  stdout_lines: <omitted>
...ignoring

TASK [custom/matrix-nginx-proxy : Fail if all SSL certificate retrieval attempts failed] ***************************************************************************************************
fatal: [matrix.redacted.org]: FAILED! => changed=false
  msg: |-
    Failed to obtain a certificate directly (by listening on port 80)
    and also failed to obtain by relying on the server at port 80 to proxy the request.
    See above for details.
    You may wish to set up proxying of /.well-known/acme-challenge to 2402 or,
    more easily, stop the server on port 80 while this playbook runs.

PLAY RECAP *********************************************************************************************************************************************************************************
matrix.redacted.org : ok=211  changed=6    unreachable=0    failed=1    skipped=1930 rescued=0    ignored=2