spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.83k stars 1.04k forks source link

Cannot connect to matrix.org or other homeservers #2487

Closed gitayam closed 1 year ago

gitayam commented 1 year ago

SOLVED:

update 0:

None of the automated systems would work to get new credentials. When I removed the previous keys and ran setup-ssl I received this error:

fatal: [matrix.domain.com]: FAILED! => changed=true
  cmd: /usr/bin/env docker run --rm --name=matrix-certbot --user=997:1001 --cap-drop=ALL -p 80:8080 --mount type=bind,src=/datadrive/matrix/ssl/config,dst=/etc/letsencrypt --mount type=bind,src=/datadrive/matrix/ssl/log,dst=/var/log/letsencrypt docker.io/certbot/certbot:amd64-v2.0.0 certonly --non-interactive --work-dir=/tmp --http-01-port 8080   --key-type ecdsa --standalone --preferred-challenges http --agree-tos --email=matrix@domain.com -d domain.com
  delta: '0:00:04.671752'
  end: '2023-02-15 07:13:30.652474'
  msg: non-zero return code
  rc: 1
  start: '2023-02-15 07:13:25.980722'
  stderr: |-
    Saving debug log to /var/log/letsencrypt/letsencrypt.log
    archive directory exists for domain.com-0001
    Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.
  stderr_lines: <omitted>
  stdout: Requesting a certificate for domain.com
  stdout_lines: <omitted>
...ignoring

UPDATE1: I ended up having to do this manually and opted to use dns method, replaced it into the ssl path and it is now all green on https://federationtester.matrix.org/#domain.com.

self-check also shows all working now...

HOWEVER still not working with federated servers

UPDATE 2: #SOLVED

With these checks working and ensuring that all the ports were open on both cloud provider and software firewall, I setup-all and start and within 5 minutes the sever was federated again!

My vars.yml file looks like this:


---
# The bare domain name which represents your Matrix identity.
# Matrix user ids for your server will be of the form (`@user:<matrix-domain>`).
#
# Note: this playbook does not touch the server referenced here.
# Installation happens on another server ('matrix.<matrix-domain>').
#
# If you've deployed using the wrong domain, you'll have to run the Uninstalling step,
# because you can't change the Domain after deployment.
#
# Example value: example.com

matrix_domain: domain.tld

##
#load balancing : https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-playbook-synapse.md
#matrix_synapse_workers_enabled: true
matrix_base_data_path: "/datadrive/matrix"

## Server stuff
## auto join
## source : https://github.com/matrix-org/synapse/blob/e7b78dcc4a6bf8fdb71782640932da8dff7cc5ed/docs/sample_config.yaml#L1264-L1274
## line 1260
enable_set_displayname: true
# Homeserver admin contacts as per MSC 1929 https://github.com/matrix-org/matrix-spec-proposals/pull/1929
matrix_homeserver_admin_contacts:
  - matrix_id: "@username_here:domain.tld"
    email_address: username_here@domain.tld
    role: admin
  - email_address: security@domain.tld
    role: security

matrix_homeserver_support_url: "https://domain.tld/support"
matrix_synapse_http_listener_resource_names: ["client","federation"]
matrix_federation_public_port: 443
matrix_synapse_federation_port_enabled: false
matrix_synapse_tls_federation_listener_enabled: false
##https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-playbook-matrix-registration.md
matrix_registration_enabled: true
# Generate a strong secret using: `pwgen -s 64 1`.
matrix_registration_admin_secret: "password_here"
# The Matrix homeserver software to install.
# See:
#  - `roles/custom/matrix-base/defaults/main.yml` for valid options
# - the `docs/configuring-playbook-IMPLEMENTATION_NAME.md` documentation page, if one is available for your implementation choice
matrix_homeserver_implementation: synapse

# A secret used as a base, for generating various other secrets.
# You can put any string here, but generating a strong one is preferred (e.g. `pwgen -s 64 1`).
matrix_homeserver_generic_secret_key: 'password_here'

# This is something which is provided to Let's Encrypt when retrieving SSL certificates for domains.
#
# In case SSL renewal fails at some point, you'll also get an email notification there.
#
# If you decide to use another method for managing SSL certificates (different than the default Let's Encrypt),
# you won't be required to define this variable (see `docs/configuring-playbook-ssl-certificates.md`).
#
# Example value: someone@example.com
matrix_ssl_lets_encrypt_support_email: 'matrix@domain.tld'
#matrix_ssl_retrieval_method: none

# A Postgres password to use for the superuser Postgres user (called `matrix` by default).
#
# The playbook creates additional Postgres users and databases (one for each enabled service)
# using this superuser account.
devture_postgres_connection_password: 'password_here'
### JITSI
matrix_jitsi_enabled: true
# Run `bash inventory/scripts/jitsi-generate-passwords.sh` to generate these passwords,
# or define your own strong passwords manually.
matrix_jitsi_jicofo_auth_password: 'password_here'
matrix_jitsi_jvb_auth_password: 'password_here'
matrix_jitsi_jibri_recorder_password: 'password_here'
matrix_jitsi_jibri_xmpp_password: 'password_here'
#matrix_jitsi_enable_auth: true
#matrix_jitsi_enable_guests: true
matrix_jitsi_web_custom_config_extension: |
  config.enableLayerSuspension = true;

  config.disableAudioLevels = true;

  // Limit the number of video feeds forwarded to each client
  config.channelLastN = 4;

matrix_jitsi_web_config_resolution_width_ideal_and_max: 1080
matrix_jitsi_web_config_resolution_height_ideal_and_max: 720
###Etherpad
matrix_etherpad_enabled: true
# Variables configuring the etherpad
matrix_etherpad_title: Etherpad
matrix_etherpad_abiword: null
matrix_etherpad_soffice: /usr/bin/soffice
# Uncomment below if you'd like to install Etherpad on the Dimension domain (not recommended)
#matrix_etherpad_mode: dimension

#Uncomment below to enable the admin web UI
matrix_etherpad_admin_username: admin
matrix_etherpad_admin_password: password_here

### MSynapse
matrix_synapse_admin_enabled: true

# Enable generation of `/.well-known/matrix/support`.
# This needs to be enabled explicitly for now, because MSC 1929 is not yet accepted.
matrix_nginx_proxy_base_domain_serving_enabled: true
matrix_well_known_matrix_support_enabled: true

### BOTS

BOTSmatrix_bot_matrix_reminder_bot_enabled: true

# Uncomment and adjust this part if you'd like to use a username different than the default
#matrix_bot_matrix_reminder_bot_matrix_user_id_localpart: reminder-bot

# Generate a strong password here. Consider generating it with `pwgen -s 64 1`
matrix_bot_matrix_reminder_bot_matrix_user_password:password_here
# Adjust this to your timezone
matrix_bot_matrix_reminder_bot_reminders_timezone: America/New York

### Signal Bot
matrix_mautrix_signal_enabled: true
matrix_mautrix_signal_relaybot_enabled: true
matrix_mautrix_signal_login_shared_secret: 'password_here'
matrix_mautrix_signal_bridge_permissions: {"@username_here:domain.tld": "admin", "*": "user"}
# Enable bridge relay bot functionality
matrix_mautrix_signal_bridge_encryption_allow: true
###
# Discord bot
matrix_mautrix_discord_enabled: true
###
# Honoroit Bot

# Uncomment and adjust this part if you'd like to use a username different than the default
#matrix_bot_honoroit_login: honoroit_bot
# Generate a strong password here. Consider generating it with `pwgen -s 64 1`
#matrix_bot_honoroit_password: password_here
# Adjust this to your room ID
#matrix_bot_honoroit_roomid: "!iIkQXgxGLQOkOhBDpn:domain.tld"
###
# Maubot
matrix_bot_maubot_enabled: false
matrix_bot_maubot_admins:
  - username_here: password_here

matrix_synapse_ext_password_provider_shared_secret_auth_enabled: true
matrix_synapse_ext_password_provider_shared_secret_auth_shared_secret: password_here

### Logging

matrix_synapse_log_level: "INFO"
matrix_synapse_storage_sql_log_level: "INFO"
matrix_synapse_root_log_level: "INFO"
### Retention Policy
#https://matrix-org.github.io/synapse/develop/message_retention_policies.html
#default_policy:
 # min_lifetime: 1d
 # max_lifetime: 1y
###OIDC
#roles/custom/matrix-synapse/templates/synapse/homeserver.yaml.j2
#matrix_synapse_configuration_extension_yaml: |

matrix_bot_chatgpt_enabled: true
# Obtain a new API key from https://platform.openai.com/account/api-keys
matrix_bot_chatgpt_openai_api_key: 'password_here'

# This is the default username
matrix_bot_chatgpt_matrix_bot_username_localpart: 'username_heregpt'

# Matrix access token (from bot user above)
# see: https://webapps.stackexchange.com/questions/131056/how-to-get-an-access-token-for-element-riot-matrix
matrix_bot_chatgpt_matrix_access_token: 'password_here'

devture_postgres_process_extra_arguments: [
  "-c max_connections=100",
  "-c shared_buffers=2GB",
  "-c effective_cache_size=6GB",
  "-c maintenance_work_mem=512MB",
  "-c checkpoint_completion_target=0.9",
  "-c wal_buffers=16MB",
  "-c default_statistics_target=100",
  "-c random_page_cost=1.1",
  "-c effective_io_concurrency=200",
  "-c work_mem=5242kB",
  "-c min_wal_size=1GB",
  "-c max_wal_size=4GB",
  "-c max_worker_processes=4",
  "-c max_parallel_workers_per_gather=2",
  "-c max_parallel_workers=4",
  "-c max_parallel_maintenance_workers=2",
]

Matrix Server:

Ansible: ansible [core 2.14.1]

Problem description:

%%Describe what you're doing, what you expect to happen and what happens instead here. Tell us what you've tried and what you're aiming to achieve.%%

After yesterday's update I am unable to connect to federated servers, including users and rooms in matrix.org. I am however able to reach users and room in the homeserver.

On client messages to federated servers show as sent but not read.

Client (please complete the following information):

Additional context edit 1:

Test with https://federationtester.matrix.org

Connection Errors Get "https://IP:8448/_matrix/key/v2/server": dial tcp IP:8448: connect: connection refused DNS results No SRV records found

however works when using matrix.domain.com

edit 2:

78288224b6c1   matrixdotorg/synapse:v1.77.0                 "/start.py run -m sy…"   About a minute ago   Up About a minute (healthy)   8008-8009/tcp, 8448/tcp                                                                                                                                                                                                                             matrix-synapse
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8448
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8448
ghost commented 1 year ago

Likely linked to https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/2480

Test with https://federationtester.matrix.org

gitayam commented 1 year ago

Likely linked to #2480

Test with https://federationtester.matrix.org

-MatchingServerName error , all others success DNS results server name/.well-known result contains explicit port number: no SRV lookup done

http.host contains ".well-known/") or (http.host eq "_matrix/" ------ have cloudflare ssl disabled-

Connection Errors Get "https://IP:8448/_matrix/key/v2/server": dial tcp IP:8448: connect: connection refused DNS results No SRV records found

Hosts

spantaleev commented 1 year ago

What is enable_set_displayname? There's no such variable in the playbook.


2480 is an issue with the new Traefik setup.

I don't see you using a matrix_playbook_reverse_proxy_type variable (so it should default to a playbook-managed-nginx value), which means you should be matrix-nginx-proxy , not Traefik.


Get "https://ip:8448/_matrix/key/v2/server": dial tcp IP:8448: connect: connection refused

Perhaps you haven't opened port 8448 in your server's firewall? See docs/prerequisites.md for other ports you may wish to open.

gitayam commented 1 year ago

What is enable_set_displayname? There's no such variable in the playbook.

Good call, left over from oidc i believe. removed now.

2480 is an issue with the new Traefik setup.

correct, other user referenced similar issue but different setup.

I don't see you using a matrix_playbook_reverse_proxy_type variable (so it should default to a playbook-managed-nginx value), which means you should be matrix-nginx-proxy , not Traefik.

Get "https://ip:8448/_matrix/key/v2/server": dial tcp IP:8448: connect: connection refused

Perhaps you haven't opened port 8448 in your server's firewall? See docs/prerequisites.md for other ports you may wish to open.

have opened on cloudnetwork and iptables

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8448
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8448
spantaleev commented 1 year ago

As far as I know, you can put Matrix behind Cloudflare's proxy. Using Cloudflare DNS is fine, but the proxy is not.

Also, besides the on-server firewall, there may be another firewall on your server provider's side. If there is, that one would also need these ports open.

gitayam commented 1 year ago

As far as I know, you can put Matrix behind Cloudflare's proxy. Using Cloudflare DNS is fine, but the proxy is not.

Proxy is off, cloudflare ssl if off for /_matrix as well as caching

Also, besides the on-server firewall, there may be another firewall on your server provider's side. If there is, that one would also need these ports open.

have opened on provider side as well

78288224b6c1   matrixdotorg/synapse:v1.77.0                 "/start.py run -m sy…"   About a minute ago   Up About a minute (healthy)   8008-8009/tcp, 8448/tcp                                                                                                                                                                                                                             matrix-synapse

thanks for the help on this!

gitayam commented 1 year ago

update: ran the self check

ansible-playbook -i inventory/hosts setup.yml --tags=self-check

and these are the results:

TASK [custom/matrix-nginx-proxy : Check .well-known on the identity hostname] ************************************************************************************************
fatal: [matrix.domain.com]: FAILED! => changed=false
  content: ''
  elapsed: 0
  msg: 'Status code was -1 and not [200]: Request failed: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)>'
  redirected: false
  status: -1
  url: https://domain.com/.well-known/matrix/client
...ignoring

TASK [custom/matrix-nginx-proxy : Fail if .well-known not working on the identity hostname] **********************************************************************************
fatal: [matrix.domain.com]: FAILED! => changed=false
  msg: 'Failed checking that the well-known file for Client Discovery is configured at `domain.com` (checked endpoint: `https://domain.com/.well-known/matrix/client`). Is port 443 open in your firewall? Full error: {''content'': '''', ''redirected'': False, ''url'': ''https://domain.com/.well-known/matrix/client'', ''status'': -1, ''elapsed'': 0, ''changed'': False, ''failed'': True, ''msg'': ''Status code was -1 and not [200]: Request failed: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)>''}'
spantaleev commented 1 year ago

Try this manually with curl or in a browser. This self-check thing doesn't tell us much.

Your SSL certificate for that domain may have indeed expired for some reason. If you confirm this, you can manually renew them with /matrix/ssl/bin/lets-encrypt-certificates-renew. Check the logs for this in /matrix/ssl/... If renewal happened, you should restart services (--tags=start) to ensure the new SSL certificates are picked up.

gitayam commented 1 year ago

Try this manually with curl or in a browser. This self-check thing doesn't tell us much.

curl: (60) SSL certificate problem: certificate has expired More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
gitayam commented 1 year ago

Try this manually with curl or in a browser. This self-check thing doesn't tell us much.

Your SSL certificate for that domain may have indeed expired for some reason. If you confirm this, you can manually renew them with /matrix/ssl/bin/lets-encrypt-certificates-renew. Check the logs for this in /matrix/ssl/... If renewal happened, you should restart services (--tags=start) to ensure the new SSL certificates are picked up.

"No renewals were attempted."

"Certificate not yet due for renewal"

edit1:

From browser I see :

Common Name (CN) domain.com Organization (O) Organizational Unit (OU) Common Name (CN) R3 Organization (O) Let's Encrypt Organizational Unit (OU) Issued On Sunday, November 13, 2022 at 8:45:15 PM Expires On Saturday, February 11, 2023 at 8:45:14 PM

This is in fact when I started having issues and came to fully notice with the latest update

gitayam commented 1 year ago

Attempted to do the SRV for well known

matrix_well_known_matrix_server_enabled: false
matrix_ssl_domains_to_obtain_certificates_for:
  - '{{ hostname_matrix }}'
  - '{{ hostname_riot }}'
  - '{{ hostname_identity }}'
# Adjust paths below to point to your certificate.
#
# NOTE: these are in-container paths. `/datadrive/matrix/ssl` on the host is mounted into the container
# at the same path (`/datadrive/matrix/ssl`) by default, so if that's the path you need, it would be seamless.
matrix_nginx_proxy_proxy_matrix_federation_api_ssl_certificate: /datadrive/matrix/ssl/config/live/domain.com/fullchain.pem
matrix_nginx_proxy_proxy_matrix_federation_api_ssl_certificate_key: /datadrive/matrix/ssl/config/live/domain.com/privkey.pem

# from https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/08635666df680bb9571624aad0439602fd8ec34c/docs/howto-server-delegation.md#server-delegation-via-a-dns-s>

I've added the SRV to dns and receiving this error. Error is there when I manually set the ssl path as well

TASK [custom/matrix-nginx-proxy : Determine unnecessary Let's Encrypt renewal configs] ***************************************************************************************
fatal: [matrix.domain.com]: FAILED! =>
  msg: |-
    The conditional check 'item.path | basename | replace('.conf', '') not in matrix_ssl_domains_to_obtain_certificates_for' failed. The error was: error while evaluating conditional (item.path | basename | replace('.conf', '') not in matrix_ssl_domains_to_obtain_certificates_for): ['{{ hostname_matrix }}', '{{ hostname_riot }}', '{{ hostname_identity }}']: 'hostname_matrix' is undefined. 'hostname_matrix' is undefined. ['{{ hostname_matrix }}', '{{ hostname_riot }}', '{{ hostname_identity }}']: 'hostname_matrix' is undefined. 'hostname_matrix' is undefined

    The error appears to be in '/home/local/matrix-docker-ansible-deploy/roles/custom/matrix-nginx-proxy/tasks/ssl/purge_ssl_lets_encrypt_orphaned_configs.yml': line 17, column 7, but may
    be elsewhere in the file depending on the exact syntax problem.

    The offending line appears to be:

        - name: Determine unnecessary Let's Encrypt renewal configs
          ^ here
spantaleev commented 1 year ago

hostname_matrix is something very very old, deprecated in a43bcd81f in 2019. Not sure where you got that from, but.. it's outdated.

SRV is difficult.. It requires that you obtain certificates for the base domain manually and make them available to the playbook. I'm not sure if you want to go that (painful) way.

gitayam commented 1 year ago

hostname_matrix is something very very old, deprecated in a43bcd8 in 2019. Not sure where you got that from, but.. it's outdated.

https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/08635666df680bb9571624aad0439602fd8ec34c/docs/howto-server-delegation.md#server-delegation-via-a-dns-srv-record-advanced

Yea reading through it seems a lot more work.

One thing I just noticed is

matrix_federation_public_port: 443
matrix_synapse_federation_port_enabled: false
matrix_synapse_tls_federation_listener_enabled: false

I am updating these and trying with 8448 and will update

gitayam commented 1 year ago

Ok so that did get me somewhere:

"Version": {
    "error": "Get \"matrix://domain.com/_matrix/federation/v1/version\": x509: certificate is valid for matrix.domain.com, not domain.com"
  },
  "FederationOK": false
}

still expired but now all the other checks are working.

gitayam commented 1 year ago

update 0:

None of the automated systems would work to get new credentials. When I removed the previous keys and ran setup-ssl I received this error:

fatal: [matrix.domain.com]: FAILED! => changed=true
  cmd: /usr/bin/env docker run --rm --name=matrix-certbot --user=997:1001 --cap-drop=ALL -p 80:8080 --mount type=bind,src=/datadrive/matrix/ssl/config,dst=/etc/letsencrypt --mount type=bind,src=/datadrive/matrix/ssl/log,dst=/var/log/letsencrypt docker.io/certbot/certbot:amd64-v2.0.0 certonly --non-interactive --work-dir=/tmp --http-01-port 8080   --key-type ecdsa --standalone --preferred-challenges http --agree-tos --email=matrix@domain.com -d domain.com
  delta: '0:00:04.671752'
  end: '2023-02-15 07:13:30.652474'
  msg: non-zero return code
  rc: 1
  start: '2023-02-15 07:13:25.980722'
  stderr: |-
    Saving debug log to /var/log/letsencrypt/letsencrypt.log
    archive directory exists for domain.com-0001
    Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.
  stderr_lines: <omitted>
  stdout: Requesting a certificate for domain.com
  stdout_lines: <omitted>
...ignoring

UPDATE1: I ended up having to do this manually and opted to use dns method, replaced it into the ssl path and it is now all green on https://federationtester.matrix.org/#domain.com.

self-check also shows all working now...

HOWEVER still not working with federated servers

UPDATE 2: #SOLVED

With these checks working and ensuring that all the ports were open on both cloud provider and software firewall, I setup-all and start and within 5 minutes the sever was federated again!

gitayam commented 1 year ago

removed the following from var.yml

matrix_federation_public_port: 443
matrix_synapse_federation_port_enabled: false
matrix_synapse_tls_federation_listener_enabled: false

removed the entire dir of ssl keys ran the setup-ssl , failed then ran certbot manually and placed in the default /matrix/ssl/.....path ran checks, good to go. With these checks working and ensuring that all the ports were open on both cloud provider and software firewall, I setup-all and start and within 5 minutes the sever was federated again!