spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.88k stars 1.04k forks source link

Dimension starts but doesn't run successfully #1849

Open zorlaski opened 2 years ago

zorlaski commented 2 years ago

Describe the bug Enabling dimension in the playbook succeeds with no errors, and I am able to load the test page at dimension..com, however attempting to add sticker packs, widgets, etc. fails consistently. I have followed all the reccomended settings in the dimension guide, and am still receiving the following requests consistently in the matrix-dimension logs:

[MatrixHttpClient (REQ-183)] GET http://matrix-nginx-proxy:12080/_matrix/client/r0/sync                                                                                                 May 23 09:59:09 <hostname> matrix-dimension[1915874]: Mon, 23 May 2022 13:59:09 GMT [DEBUG] [MatrixClientLite] Received sync. Next token: <TOKEN>                                                                                                      
May 23 09:59:09 <hostname> matrix-dimension[1915874]: Mon, 23 May 2022 13:59:09 GMT [DEBUG] [MatrixClientLite] Performing sync with token <TOKEN>

Every time I start using the standard ansible script to start up dimension, the following error occurs midway through startup:


May 23 09:33:02 <hostname> matrix-dimension[1914168]: Mon, 23 May 2022 13:33:02 GMT [DEBUG] [MatrixHttpClient (REQ-1)] GET http://matrix-nginx-proxy:12080/_matrix/client/r0/account/whoami
May 23 09:33:02 <hostname> matrix-dimension[1914168]: Mon, 23 May 2022 13:33:02 GMT [ERROR] [MatrixHttpClient (REQ-1)] Error: connect ECONNREFUSED <IP Address>:12080
May 23 09:33:02 <hostname> matrix-dimension[1914168]:     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
May 23 09:33:02 <hostname> matrix-dimension[1914168]:   errno: -111,                                                                                                                                                                                                                May 23 09:33:02 <hostname> matrix-dimension[1914168]:   code: 'ECONNREFUSED',                                                                                                                                                                                                       May 23 09:33:02 <hostname> matrix-dimension[1914168]:   syscall: 'connect',
May 23 09:33:02 <hostname> matrix-dimension[1914168]:   address: '<IP Address>',
May 23 09:33:02 <hostname> matrix-dimension[1914168]:   port: 12080
May 23 09:33:02 <hostname> matrix-dimension[1914168]: }
May 23 09:33:02 <hostname> matrix-dimension[1914168]: Error: connect ECONNREFUSED <IP Address>:12080
May 23 09:33:02 <hostname> matrix-dimension[1914168]:     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
May 23 09:33:02 <hostname> matrix-dimension[1914168]:   errno: -111,
May 23 09:33:02 <hostname> matrix-dimension[1914168]:   code: 'ECONNREFUSED',
May 23 09:33:02 <hostname> matrix-dimension[1914168]:   syscall: 'connect',
May 23 09:33:02 <hostname> matrix-dimension[1914168]:   address: '<IP Address>',
May 23 09:33:02 <hostname> matrix-dimension[1914168]:   port: 12080
May 23 09:33:02 <hostname> matrix-dimension[1914168]: }

Running the t2bot connection test widget fails on the homeserver step as well. Error contacting homeserver. This usually means your federation setup is incorrect, or your homeserver is offline. Consult your homeserver's documentation for how to set up federation.

To Reproduce My vars.yml file looks like this:

# The bare domain name which represents your Matrix identity.
# Matrix user ids for your server will be of the form (`@user:<matrix-domain>`).
#
# Note: this playbook does not touch the server referenced here.
# Installation happens on another server ("matrix.<matrix-domain>").
#
# If you've deployed using the wrong domain, you'll have to run the Uninstalling step,
# because you can't change the Domain after deployment.
#
# Example value: example.com
matrix_domain: <domain>

# The Matrix homeserver software to install.
# See `roles/matrix-base/defaults/main.yml` for valid options.
matrix_homeserver_implementation: synapse

# A secret used as a base, for generating various other secrets.
# You can put any string here, but generating a strong one is preferred (e.g. `pwgen -s 64 1`).
matrix_homeserver_generic_secret_key: '<secret>'

# This is something which is provided to Let's Encrypt when retrieving SSL certificates for domains.
#
# In case SSL renewal fails at some point, you'll also get an email notification there.
#
# If you decide to use another method for managing SSL certificates (different than the default Let's Encrypt),
# you won't be required to define this variable (see `docs/configuring-playbook-ssl-certificates.md`).
#
# Example value: someone@example.com
matrix_ssl_lets_encrypt_support_email: '<email>'

#########
##JITSI##
#########

# A Postgres password to use for the superuser Postgres user (called `matrix` by default).
#
# The playbook creates additional Postgres users and databases (one for each enabled service)
# using this superuser account.
matrix_postgres_connection_password: '<secret>'
matrix_jitsi_enabled: true

# Run `bash inventory/scripts/jitsi-generate-passwords.sh` to generate these passwords,
# or define your own strong passwords manually.
matrix_jitsi_jicofo_auth_password: <secret>
matrix_jitsi_jvb_auth_password: <secret>
matrix_jitsi_jibri_recorder_password: <secret>
matrix_jitsi_jibri_xmpp_password: <secret>
matrix_jitsi_jvb_container_extra_arguments:
  - '--env "DOCKER_HOST_ADDRESS=<local IP>"'
matrix_jitsi_web_custom_config_extension: |
  config.enableLayerSuspension = true;

  config.disableAudioLevels = true;

  // Limit the number of video feeds forwarded to each client
  config.channelLastN = 4;

############
##GRAPHANA##
############
matrix_prometheus_enabled: true

matrix_prometheus_node_exporter_enabled: true

matrix_grafana_enabled: true
matrix_grafana_anonymous_access: false

# This has no relation to your Matrix user id. It can be any username you'd like.
# Changing the username subsequently won't work.
matrix_grafana_default_admin_user: <admin user>

# Changing the password subsequently won't work.
matrix_grafana_default_admin_password: <password>

#########
# NGINX #
#########
matrix_nginx_proxy_access_log_enabled: false

#############
# DIMENSION #
#############

matrix_dimension_enabled: true
matrix_dimension_admins:
  - "@dimension:matrix.<domain>"
  - "@<admin user>:matrix.<domain>"
matrix_dimension_access_token: "<dimension's access token>"

############
# POSTGRES #
############
matrix_postgres_process_extra_arguments: [
  "-c 'max_connections=200'",
  "-c 'shared_buffers=512MB'",
  "-c 'effective_cache_size=1536MB'",
  "-c 'maintenance_work_mem=128MB'",
  "-c 'checkpoint_completion_target=0.9'",
  "-c 'wal_buffers=16MB'",
  "-c 'default_statistics_target=100'",
  "-c 'random_page_cost=1.1'",
  "-c 'effective_io_concurrency=200'",
  "-c 'work_mem=2621kB'",
  "-c 'min_wal_size=1GB'",
  "-c 'max_wal_size=4GB'",
  "-c 'max_worker_processes=2'",
  "-c 'max_parallel_workers_per_gather=1'",
  "-c 'max_parallel_workers=2'",
  "-c 'max_parallel_maintenance_workers=1'",
]

#########
# OTHER #
#########
matrix_synapse_admin_enabled: true
matrix_registration_enabled: true

# Generate a strong secret using: `pwgen -s 64 1`.
matrix_registration_admin_secret: "<admin secret>"

matrix_synapse_configuration_extension_yaml: |
  retention:
    enabled: true
    purge_jobs:
      - longest_max_lifetime: 3d
        shortest_max_lifetime: 1d
        interval: 4h
    default_policy:
      min_lifetime: 1d
      max_lifetime: 60h
    allowed_lifetime_max: 3d

Expected behavior

Additional context running the following command to start the playbook: sudo ansible-playbook -i </path/to/playbook>/inventory/hosts setup.yml --tags=start -e ansible_python_interpreter=/usr/bin/python3

Madchristian commented 2 years ago

same here

hnk commented 2 years ago

I have had the same issue for a few months now.

All services do start successfully, but the ansible command ansible-playbook -i inventory/hosts setup.yml --tags=start fails with the above error.

Debugging a bit, it seems that Dimension does start up, but is faster than matrix-nginx-proxy and begins issuing requests when matrix-nginx-proxy is not ready yet to receive them.

This causes the container to fail and the service gets restarted after 30 seconds -- which in turn will cause the playbook to fail, since matrix_common_after_systemd_service_start_wait_for_timeout_seconds is configured at a default 15 seconds.

To verify my suspicion, I set matrix_common_after_systemd_service_start_wait_for_timeout_seconds to 45 seconds and ran the command ansible-playbook -i inventory/hosts setup.yml --tags=start again.

This stopped the error from occurring.

I am not certain how this could be addressed, but wanted to give some more info on this.

spantaleev commented 2 years ago

We could introduce an intentional delay ot matrix-dimension.service (the systemd service starting Dimension).

We could also open an issue in the Dimension repository and ask to change Dimension so that it doesn't hard-fail when the homeserver is temporarily unavailable. Not sure how maintained Dimension is nowadays (I suspect it's not), so we'll probably be out of luck reporting issues there.

janonym1 commented 1 year ago

I also found that setting the (now called) variable devture_systemd_service_manager_up_verification_delay_seconds: 60 to 60 seconds solves the error on my slower machine while 45 was still to slow. However on my production host, which is way beefier, the default 15s seem to be enough