spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.76k stars 1.03k forks source link

No containers start after ansible playbook deployed #1020

Closed tomlawesome closed 3 years ago

tomlawesome commented 3 years ago

Could be a mega noob question here -- but no services have started and there's no obvious location with a docker-compose.yml or anything from which to launch the services?

Does this suggest a borked deployment?

aaronraimist commented 3 years ago

This playbook doesn't use docker compose, see https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/62c0587b6aaed84c43a0277c7e4303aa6108edfc/docs/faq.md#why-dont-you-use-docker-compose

What step are you on? Starting services is covered here https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/installing.md#starting-the-services

tomlawesome commented 3 years ago

This playbook doesn't use docker compose, see https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/62c0587b6aaed84c43a0277c7e4303aa6108edfc/docs/faq.md#why-dont-you-use-docker-compose

What step are you on? Starting services is covered here https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/installing.md#starting-the-services

Thank you, I had some how missed the second command having just spent quite a while setting up various aspects of the playbook!

TASK [matrix-common-after : Fail if service isn't detected to be running] ****************************************************************************
skipping: [matrix.tomlawson.io] => (item=matrix-mailer.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-postgres.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-redis) 
skipping: [matrix.tomlawson.io] => (item=matrix-mautrix-signal.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-mautrix-signal-daemon.service) 
failed: [matrix.tomlawson.io] (item=matrix-mautrix-telegram.service) => changed=false 
  ansible_loop_var: item
  item: matrix-mautrix-telegram.service
  msg: matrix-mautrix-telegram.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status matrix-mautrix-telegram.service` and `journalctl -fu matrix-mautrix-telegram.service` on the server to investigate.
skipping: [matrix.tomlawson.io] => (item=matrix-mx-puppet-discord.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-mx-puppet-instagram.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse-worker-generic_worker-18111.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse-worker-federation_sender-0.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse-worker-pusher-0.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse-worker-appservice-0.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse-worker-media_repository-18551.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse-worker-frontend_proxy-18771.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse-admin.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-prometheus-node-exporter.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-prometheus.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-grafana.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-client-element.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-jitsi-web.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-jitsi-prosody.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-jitsi-jicofo.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-jitsi-jvb.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-ma1sd.service) 
failed: [matrix.tomlawson.io] (item=matrix-nginx-proxy.service) => changed=false 
  ansible_loop_var: item
  item: matrix-nginx-proxy.service
  msg: matrix-nginx-proxy.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status matrix-nginx-proxy.service` and `journalctl -fu matrix-nginx-proxy.service` on the server to investigate.
skipping: [matrix.tomlawson.io] => (item=matrix-coturn.service) 

Its now stuck here

aaronraimist commented 3 years ago

You can run journalctl -fu matrix-nginx-proxy.service to see what the logs from those are. Usually if this is your first time setting up the server I would have recommended that you just start with the default services and later enable workers and all of the other bridges and services that you want. Those extra things likely need the base services to be configured first.

tomlawesome commented 3 years ago

You can run journalctl -fu matrix-nginx-proxy.service to see what the logs from those are. Usually if this is your first time setting up the server I would have recommended that you just start with the default services and later enable workers and all of the other bridges and services that you want. Those extra things likely need the base services to be configured first.

Thanks — do I need to uninstall everything and run the installer again with them removed to do that?

aaronraimist commented 3 years ago

No you can likely get it to work with how everything is right now. You'll just have to dig in to the logs to see why some things are failing to start and you may have to configure some things or temporarily disable those services to get things working.

tomlawesome commented 3 years ago

No you can likely get it to work with how everything is right now. You'll just have to dig in to the logs to see why some things are failing to start and you may have to configure some things or temporarily disable those services to get things working.

Slight confused by the log output: https://pb.tomlawson.io/?562a330ba0a7dda9#4X2vcE2Ug5H1HZsJcFoybyaJ2k6dbzLRuQmK1BPrrbsG

Should the nginx.conf be created manually? Feels like it should've been auto-generated through Ansible?

Also, the folder '/matrix/ssl/config/live/element.example.com/' doesn't exist? (domain obvs changed for example)

aaronraimist commented 3 years ago

I can't view that. No matter what IP I connect from Cloudflare blocks it. You can just paste it into a GitHub comment or Gist if it is very long.

Btw you don't really have to bother with redacting the domains, you already posted the domain above, plus those logs are hosted on the same domain.

aaronraimist commented 3 years ago

and yes nginx is all configured for you, assuming you haven't explicitly disabled it.

tomlawesome commented 3 years ago

I can't view that. No matter what IP I connect from Cloudflare blocks it. You can just paste it into a GitHub comment or Gist if it is very long.

Weird. Must have set things too tight on CF somehow then. Odd though.. as I see it wherever I change VPN to, 4G etc as do a couple of friends :/ Are you outside Europe?

Btw you don't really have to bother with redacting the domains, you already posted the domain above, plus those logs are hosted on the same domain.

I'm never sure why I bother myself tbh.. noted.

and yes nginx is all configured for you, assuming you haven't explicitly disabled it.

SSL is turned off as I'm already running traefik elsewhere.

I've re-run the installer with the 'extras' turned off to hopefully simplify things a bit. But getting the same output:

Apr 22 21:30:27 matrix systemd[1]: matrix-nginx-proxy.service: Main process exited, code=exited, status=1/FAILURE
Apr 22 21:30:27 matrix systemd[1]: matrix-nginx-proxy.service: Failed with result 'exit-code'.
Apr 22 21:30:57 matrix systemd[1]: matrix-nginx-proxy.service: Service RestartSec=30s expired, scheduling restart.
Apr 22 21:30:57 matrix systemd[1]: matrix-nginx-proxy.service: Scheduled restart job, restart counter is at 347.
Apr 22 21:30:57 matrix systemd[1]: Stopped Matrix nginx-proxy server.
Apr 22 21:30:57 matrix systemd[1]: Starting Matrix nginx-proxy server...
Apr 22 21:30:57 matrix systemd[1]: Started Matrix nginx-proxy server.
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: 10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf is not a file or does not exist
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: /docker-entrypoint.sh: Configuration complete; ready for start up
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: 2021/04/22 20:30:59 [emerg] 1#1: cannot load certificate "/matrix/ssl/config/live/element.tomlawson.io/fullchain.pem": BIO_new_file() failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/matrix/ssl/config/live/element.tomlawson.io/fullchain.pem','r') error:2006D080:BIO routines:BIO_new_file:no such file)
Apr 22 21:30:59 matrix matrix-nginx-proxy[8390]: nginx: [emerg] cannot load certificate "/matrix/ssl/config/live/element.tomlawson.io/fullchain.pem": BIO_new_file() failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/matrix/ssl/config/live/element.tomlawson.io/fullchain.pem','r') error:2006D080:BIO routines:BIO_new_file:no such file)
Apr 22 21:31:00 matrix systemd[1]: matrix-nginx-proxy.service: Main process exited, code=exited, status=1/FAILURE
Apr 22 21:31:00 matrix systemd[1]: matrix-nginx-proxy.service: Failed with result 'exit-code'.
aaronraimist commented 3 years ago

Which of these methods are you trying to follow? https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-playbook-own-webserver.md

If you want to use method 1 then you need to disable nginx. If you want to use method 2 then you need to see that guide for other things you need to set so that nginx will start with no certificates.

tomlawesome commented 3 years ago

Was trying for method 3, have changed the binds to 0.0.0.0 instead and it seemed to like that.

failed: [matrix.tomlawson.io] (item=matrix-ma1sd.service) => changed=false 
  ansible_loop_var: item
  item: matrix-ma1sd.service
  msg: matrix-ma1sd.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status matrix-ma1sd.service` and `journalctl -fu matrix-ma1sd.service` on the server to investigate.

Getting this error on running the start playbook. The odd thing, is that the journal output and status check via systemctl suggest it's running, but had exited?

tom@matrix:~$ sudo journalctl -fu matrix-ma1sd.service
[sudo] password for tom: 
-- Logs begin at Thu 2021-04-22 20:08:40 BST. --
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.kamax.mxisd.profile.ProfileManager -   - SynapseSqlProfileProvider
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.kamax.mxisd.notification.NotificationManager - Found handler raw for medium email
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.kamax.mxisd.notification.NotificationManager - --- Notification handler ---
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.kamax.mxisd.notification.NotificationManager -         Handler for email: raw
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.kamax.mxisd.invitation.InvitationManager - Loaded saved invites
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.kamax.mxisd.invitation.InvitationManager - Setting up invitation mapping refresh timer
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.kamax.mxisd.directory.DirectoryManager - Directory providers:
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.kamax.mxisd.directory.DirectoryManager -   - io.kamax.mxisd.backend.sql.synapse.SynapseSqlDirectoryProvider
Apr 22 21:49:51 matrix matrix-ma1sd[20626]: [main] INFO io.undertow - starting server: Undertow - 2.0.27.Final
Apr 22 21:49:52 matrix matrix-ma1sd[20626]: [main] INFO App - ma1sd started
● matrix-ma1sd.service - Matrix ma1sd Identity server
   Loaded: loaded (/etc/systemd/system/matrix-ma1sd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-04-22 21:49:47 BST; 16s ago
  Process: 20598 ExecStartPre=/usr/bin/env sh -c /usr/bin/env docker kill matrix-ma1sd 2>/dev/null (code=exited, status=1/FAILURE)
  Process: 20612 ExecStartPre=/usr/bin/env sh -c /usr/bin/env docker rm matrix-ma1sd 2>/dev/null (code=exited, status=1/FAILURE)
 Main PID: 20626 (docker)
    Tasks: 12 (limit: 4915)
   Memory: 34.1M
   CGroup: /system.slice/matrix-ma1sd.service
           └─20626 docker run --rm --name matrix-ma1sd --log-driver=none --user=998:1001 --cap-drop=ALL --read-only --tmpfs=/tmp:rw,exec,nosuid,size=1

And it's not that it's starting/stopping either. It seems to be consistently running:

● matrix-ma1sd.service - Matrix ma1sd Identity server
   Loaded: loaded (/etc/systemd/system/matrix-ma1sd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-04-22 21:49:47 BST; 4min 2s ago
  Process: 20598 ExecStartPre=/usr/bin/env sh -c /usr/bin/env docker kill matrix-ma1sd 2>/dev/null (code=exited, status=1/FAILURE)
  Process: 20612 ExecStartPre=/usr/bin/env sh -c /usr/bin/env docker rm matrix-ma1sd 2>/dev/null (code=exited, status=1/FAILURE)
 Main PID: 20626 (docker)
    Tasks: 12 (limit: 4915)
   Memory: 34.2M
   CGroup: /system.slice/matrix-ma1sd.service
           └─20626 docker run --rm --name matrix-ma1sd --log-driver=none --user=998:1001 --cap-drop=ALL --read-only --tmpfs=/tmp:rw,exec,nosuid,size=1
tomlawesome commented 3 years ago

It may help someone else to know the comment above's problem was resolved by altering the start script to wait for 45s allowed more time for all the services to come online:

/matrix-docker-ansible-deploy/roles/matrix-common-after/tasks/start.yml

TASK [matrix-common-after : Wait a bit, so that services can start (or fail)] ************************************************************************
ok: [matrix.tomlawson.io]

TASK [matrix-common-after : Populate service facts] **************************************************************************************************
ok: [matrix.tomlawson.io]

TASK [matrix-common-after : Fail if service isn't detected to be running] ****************************************************************************
skipping: [matrix.tomlawson.io] => (item=matrix-mailer.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-postgres.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-client-element.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-ma1sd.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-nginx-proxy.service) 

TASK [matrix-common-after : Fetch systemd information] ***********************************************************************************************
skipping: [matrix.tomlawson.io] => (item=matrix-mailer.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-postgres.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-synapse.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-client-element.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-ma1sd.service) 
skipping: [matrix.tomlawson.io] => (item=matrix-nginx-proxy.service) 

TASK [matrix-common-after : Fail if service isn't detected to be running] ****************************************************************************
skipping: [matrix.tomlawson.io] => (item={'changed': False, 'skipped': True, 'skip_reason': 'Conditional result was False', 'item': 'matrix-mailer.service', 'ansible_loop_var': 'item'}) 
skipping: [matrix.tomlawson.io] => (item={'changed': False, 'skipped': True, 'skip_reason': 'Conditional result was False', 'item': 'matrix-postgres.service', 'ansible_loop_var': 'item'}) 
skipping: [matrix.tomlawson.io] => (item={'changed': False, 'skipped': True, 'skip_reason': 'Conditional result was False', 'item': 'matrix-synapse.service', 'ansible_loop_var': 'item'}) 
skipping: [matrix.tomlawson.io] => (item={'changed': False, 'skipped': True, 'skip_reason': 'Conditional result was False', 'item': 'matrix-client-element.service', 'ansible_loop_var': 'item'}) 
skipping: [matrix.tomlawson.io] => (item={'changed': False, 'skipped': True, 'skip_reason': 'Conditional result was False', 'item': 'matrix-ma1sd.service', 'ansible_loop_var': 'item'}) 
skipping: [matrix.tomlawson.io] => (item={'changed': False, 'skipped': True, 'skip_reason': 'Conditional result was False', 'item': 'matrix-nginx-proxy.service', 'ansible_loop_var': 'item'}) 
CONTAINER ID   IMAGE                          COMMAND                  CREATED         STATUS                   PORTS                                                  NAMES
7a3b2e431824   matrixdotorg/synapse:v1.32.2   "/start.py run -m sy…"   2 minutes ago   Up 2 minutes (healthy)   8008-8009/tcp, 8448/tcp                                matrix-synapse
8793d2833fbe   ma1uta/ma1sd:2.4.0-amd64       "/start.sh"              2 minutes ago   Up 2 minutes             8090/tcp                                               matrix-ma1sd
ee2c7206aaf2   nginx:1.19.10-alpine           "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes             80/tcp, 0.0.0.0:80->8080/tcp, 0.0.0.0:8449->8448/tcp   matrix-nginx-proxy
22daa1809fa6   vectorim/element-web:v1.7.25   "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes             80/tcp                                                 matrix-client-element
c786eabe8b54   postgres:13.2-alpine           "docker-entrypoint.s…"   2 minutes ago   Up 2 minutes             5432/tcp                                               matrix-postgres
1cf14040865d   devture/exim-relay:4.94-r0     "exim -bdf -q15m"        2 minutes ago   Up 2 minutes             8025/tcp                                               matrix-mailer
aaronraimist commented 3 years ago

What kind of specs does your server have?

tomlawesome commented 3 years ago

What kind of specs does your server have?

It's a VM hosted on Proxmox with 8 cores, 10GB RAM, 20GB disk on an SSD. Main server is Dell PowerEdge T620 w/ E5-2697 x 2 and 384GB RAM and it's not under crazy load. It's not a shared server or anything -- it's in my house.

The two services that 'failed' with a shorter wait time were Synapse and ma1sd.

Both now say they're active and running, but there is an error/exit in the systemctl status still:

● matrix-synapse.service - Synapse server
   Loaded: loaded (/etc/systemd/system/matrix-synapse.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-04-22 22:04:43 BST; 10min ago
  Process: 24644 ExecStartPre=/usr/bin/env sh -c /usr/bin/env docker kill matrix-synapse 2>/dev/null (code=exited, status=1/FAILURE)
  Process: 24670 ExecStartPre=/usr/bin/env sh -c /usr/bin/env docker rm matrix-synapse 2>/dev/null (code=exited, status=1/FAILURE)
 Main PID: 24683 (docker)
    Tasks: 13 (limit: 4915)
   Memory: 32.3M
   CGroup: /system.slice/matrix-synapse.service
           └─24683 docker run --rm --name matrix-synapse --log-driver=none --user=998:1001 --env=UID=998 --env=GID=1001 --cap-drop=ALL --read-only --tmpfs=/tmp:rw,noexec,nosuid,size=2500m --network=mat

Not having much luck here... I am using Firefox xD

elementfail

matrixfail

So close! :)

aaronraimist commented 3 years ago

Yeah that should definitely be powerful enough. Not sure why they wouldn't be able to start in time.

Those two errors are fine. The script tries to stop Synapse if it is running before starting it again. Those errors just means Synapse wasn't running last time when it tried to kill Synapse.

The can't run Element error usually means you either have add-ons or Firefox settings that are blocking things that Element needs such as indexeddb, localstorage, etc. You should be able to open the browser console to see what it is missing.

tomlawesome commented 3 years ago

Yeah that should definitely be powerful enough. Not sure why they wouldn't be able to start in time.

Yeah, strange! My install is -- perhaps -- non-standard. I install Debian10 without the 'common' packages etc to get a minimal install and then run a script that pulls a package list from my github of requires dependencies etc for docker, some utils etc which works perfectly well on ~5 other VMs (not heavy use, just for segregation of services etc really). It then installs docker + docker-compose. Basically just an init script to make the minimal debian install ready to run docker.

Those two errors are fine. The script tries to stop Synapse if it is running before starting it again. Those errors just means Synapse wasn't running last time when it tried to kill Synapse.

Ah, I did wonder as I noticed in the docs that you mount /dev/null over some files as a method to remove them

The can't run Element error usually means you either have add-ons or Firefox settings that are blocking things that Element needs such as indexeddb, localstorage, etc. You should be able to open the browser console to see what it is missing.

Thanks -- you were totally right, uBlock Origin was blocking a couple of things.

Should I see anything when I navigate to matrix.tomlawson.io? I just get no such resource returned, the reverse proxy (traefik) forwards fine and I have a pathprefix rule in place for the /.wellknown/ setup as follows, and the middleware just forwards the real ip and adds secure headers. Removing the middlewares + matrix-base router (so it's only looking for the path prefix) doesn't seem to do anything either, nor does point it at port 8448:

http:
  routers:
    matrix-base:
      rule: Host(`matrix.tomlawson.io`)
      service: matrix-base
      middlewares:
        - chain-cfp-public@file  
    matrix-wellknown:
      rule: Host(`matrix.tomlawson.io`) && PathPrefix(`/.well-known/matrix/`)
      service: matrix-base
      middlewares:
        - chain-cfp-public@file

  services:
    matrix-base:
      loadBalancer:
        servers:
          - url: 'http://192.168.11.21:8449'

My vars.yml actually has the port set to 8448 as mentioned in Method 2

matrix_domain: tomlawson.io

matrix_ssl_retrieval_method: none
matrix_nginx_proxy_https_enabled: false
matrix_nginx_proxy_container_http_host_bind_port: '0.0.0.0:80'
matrix_nginx_proxy_container_federation_host_bind_port: '0.0.0.0:8448'
matrix_coturn_enabled: false

Containers appear to be listening correctly, except nginx which is still listening on port 8449. I can't find anything in the nginx conf/.j2 files that looks like it would change this though?

CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS                    PORTS                                                  NAMES
beaacf303998   matrixdotorg/synapse:v1.32.2   "/start.py run -m sy…"   22 minutes ago   Up 22 minutes (healthy)   8008-8009/tcp, 8448/tcp                                matrix-synapse
d5c4802f58ff   ma1uta/ma1sd:2.4.0-amd64       "/start.sh"              22 minutes ago   Up 22 minutes             8090/tcp                                               matrix-ma1sd
2afb393c932c   nginx:1.19.10-alpine           "/docker-entrypoint.…"   23 minutes ago   Up 23 minutes             80/tcp, 0.0.0.0:80->8080/tcp, 0.0.0.0:8449->8448/tcp   matrix-nginx-proxy
80180da73aa3   vectorim/element-web:v1.7.25   "/docker-entrypoint.…"   23 minutes ago   Up 23 minutes             80/tcp                                                 matrix-client-element
b1d4fdced418   postgres:13.2-alpine           "docker-entrypoint.s…"   23 minutes ago   Up 23 minutes             5432/tcp                                               matrix-postgres
ee5504b8b7ca   devture/exim-relay:4.94-r0     "exim -bdf -q15m"        23 minutes ago   Up 23 minutes             8025/tcp                                               matrix-mailer

For clarity, this is the well known SRV entry I have (as per docs, I have not created the other one): matrix2

element3

tomlawesome commented 3 years ago

I notice that the matrix-base service expects to be available on http://matrix-synapse:8008 (via docker DNS) but don't see anywhere that proxies the domain on port 80/443 to 8008?

Small amount of progress, an error found by running self-check. I can't tell from the output though whether it's failing because it's a 404 or it's giving a 404 because the content security policy is blocking things:

TASK [matrix-synapse : Check Matrix Client API] *******************************************************************************************************************
fatal: [matrix.tomlawson.io]: FAILED! => changed=false 
  alt_svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400
  cf_cache_status: DYNAMIC
  cf_ray: 644679deda1d071a-LHR
  cf_request_id: 099fea7f460000071a07317000000001
  connection: close
  content_security_policy: 'frame-ancestors *.tomlawson.io;block-all-mixed-content;default-src *.tomlawson.io;script-src *.tomlawson.io ''unsafe-eval'' ''unsafe-inline'';style-src *  ''unsafe-inline'' *.tomlawson.io;frame-src *.tomlawson.io;img-src ''self'' *.tomlawson.io *.googleapis.com *.gravatar.com data: ''unsafe-eval'';font-src *.gstatic.com  *.tomlawson.io data: font/woff;connect-src ''self'' *.tomlawson.io;manifest-src ''self'';base-uri ''self'';form-action ''self'' *.tomlawson.io;media-src *.tomlawson.io;'
  content_type: text/html; charset=utf-8
  date: Fri, 23 Apr 2021 10:40:51 GMT
  elapsed: 0
  expect_ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
  feature_policy: camera 'none'; geolocation 'none'; microphone 'none'; payment 'none'; usb 'none'; vr 'none';
  msg: 'Status code was 404 and not [200]: HTTP Error 404: Not Found'
  nel: '{"report_to":"cf-nel","max_age":604800}'
  redirected: false
  referrer_policy: strict-origin-when-cross-origin
  report_to: '{"group":"cf-nel","endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?s=8ghh33dPjFZbzbna6scNDmG%2BnzQZEavm8CGxR%2BXVh4RkvFLuYOG0mjYTqSPIoIBpNgdecQtH2yaHcwEfU68Wjkv6xdp%2FGLFQIt0jahEOOCP7cx9m"}],"max_age":604800}'
  server: cloudflare
  set_cookie: __cfduid=d10f926a0af7e324ddda7c86067cce6f51619174451; expires=Sun, 23-May-21 10:40:51 GMT; path=/; domain=.tomlawson.io; HttpOnly; SameSite=Lax; Secure
  status: 404
  strict_transport_security: max-age=63072000; includeSubDomains; preload
  transfer_encoding: chunked
  url: https://matrix.tomlawson.io/_matrix/client/versions
  vary: Origin
  x_content_type_options: nosniff
  x_frame_options: allow-from https:prox.tomlawson.io
  x_robots_tag: none,noarchive,nosnippet,notranslate,noimageindex,
  x_xss_protection: 1; mode=block
...ignoring

TASK [matrix-synapse : Fail if Matrix Client API not working] *****************************************************************************************************
fatal: [matrix.tomlawson.io]: FAILED! => changed=false 
  msg: 'Failed checking Matrix Client API is up at `matrix.tomlawson.io` (checked endpoint: `https://matrix.tomlawson.io/_matrix/client/versions`). Is Synapse running? Is port 443 open in your firewall? Full error: {''redirected'': False, ''url'': ''https://matrix.tomlawson.io/_matrix/client/versions'', ''status'': 404, ''date'': ''Fri, 23 Apr 2021 10:40:51 GMT'', ''content_type'': ''text/html; charset=utf-8'', ''transfer_encoding'': ''chunked'', ''connection'': ''close'', ''set_cookie'': ''__cfduid=d10f926a0af7e324ddda7c86067cce6f51619174451; expires=Sun, 23-May-21 10:40:51 GMT; path=/; domain=.tomlawson.io; HttpOnly; SameSite=Lax; Secure'', ''content_security_policy'': "frame-ancestors *.tomlawson.io;block-all-mixed-content;default-src *.tomlawson.io;script-src *.tomlawson.io ''unsafe-eval'' ''unsafe-inline'';style-src *  ''unsafe-inline'' *.tomlawson.io;frame-src *.tomlawson.io;img-src ''self'' *.tomlawson.io *.googleapis.com *.gravatar.com data: ''unsafe-eval'';font-src *.gstatic.com  *.tomlawson.io
    data: font/woff;connect-src ''self'' *.tomlawson.io;manifest-src ''self'';base-uri ''self'';form-action ''self'' *.tomlawson.io;media-src *.tomlawson.io;", ''feature_policy'': "camera ''none''; geolocation ''none''; microphone ''none''; payment ''none''; usb ''none''; vr ''none'';", ''referrer_policy'': ''strict-origin-when-cross-origin'', ''strict_transport_security'': ''max-age=63072000; includeSubDomains; preload'', ''vary'': ''Origin'', ''x_content_type_options'': ''nosniff'', ''x_frame_options'': ''allow-from https:prox.tomlawson.io'', ''x_robots_tag'': ''none,noarchive,nosnippet,notranslate,noimageindex,'', ''x_xss_protection'': ''1; mode=block'', ''cf_cache_status'': ''DYNAMIC'', ''cf_request_id'': ''099fea7f460000071a07317000000001'', ''expect_ct'': ''max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"'', ''report_to'': ''{"group":"cf-nel","endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report?s=8ghh33dPjFZbzbna6scNDmG%2BnzQZEavm8CGxR%2BXVh4RkvFLuYOG0mjYTqSPIoIBpNgdecQtH2yaHcwEfU68Wjkv6xdp%2FGLFQIt0jahEOOCP7cx9m"}],"max_age":604800}'',
    ''nel'': ''{"report_to":"cf-nel","max_age":604800}'', ''server'': ''cloudflare'', ''cf_ray'': ''644679deda1d071a-LHR'', ''alt_svc'': ''h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400'', ''elapsed'': 0, ''changed'': False, ''failed'': True, ''msg'': ''Status code was 404 and not [200]: HTTP Error 404: Not Found''}'
aaronraimist commented 3 years ago

That's about as far as I can help with. I know nothing about Traefik.

All I can tell you is https://matrix.tomlawson.io/ doesn't seem to be serving any Matrix client APIs (like https://matrix.tomlawson.io/_matrix/client/versions) but it is serving Matrix federation APIs (like https://matrix.tomlawson.io/_matrix/federation/v1/version). The client APIs are actually 404, you can see for yourself by clicking the link.

tomlawesome commented 3 years ago

No worries, do really appreciate it. Help always appreciated but never expected!

Do you happen to know what port the client should be available at locally on LAN? I think the issue is either that the client side is not being served, or that I've just not pointed the client side to the right port. I am assuming that the client side does not use the same port as the federated bit?

E.g. I can't find a http://192.168.0.2:PORT/_matrix/client/versions that loads anything.

aaronraimist commented 3 years ago

The local port should be the port in matrix_nginx_proxy_container_http_host_bind_port so 80 based on https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/1020#issuecomment-825519824

tomlawesome commented 3 years ago

The local port should be the port in matrix_nginx_proxy_container_http_host_bind_port so 80 based on #1020 (comment)

I think I got it working? https://matrix.tomlawson.io/_matrix/client/versions

aaronraimist commented 3 years ago

Yeah that looks good. The self check, at least for the client API, should pass now. The self check expects that you use 8448 for federation but if you want to run everything through 443 that's fine.

tomlawesome commented 3 years ago

Yea, it does (the self-check) fail on the federation API though, as does the federation tester https://federationtester.matrix.org/api/report?server_name=matrix.tomlawson.io

As a point of interested, I kept an eye on the containers as they start and it just looks like two of them wait for the other four containers, which takes ~30s and then they boot. Nothing fails:

969255aafb53   matrixdotorg/synapse:v1.32.2   "/start.py run -m sy…"   18 seconds ago   Up 16 seconds (health: start
44e1346f919f   ma1uta/ma1sd:2.4.0-amd64       "/start.sh"              19 seconds ago   Up 16 seconds               
6d0ef4dfa5df   nginx:1.19.10-alpine           "/docker-entrypoint.…"   53 seconds ago   Up 48 seconds               
5377f48dadc3   vectorim/element-web:v1.7.25   "/docker-entrypoint.…"   55 seconds ago   Up 52 seconds               
994acf418b14   postgres:13.2-alpine           "docker-entrypoint.s…"   56 seconds ago   Up 54 seconds               
1092bb446630   devture/exim-relay:4.94-r0     "exim -bdf -q15m"        57 seconds ago   Up 55 seconds 

The gift that keeps on giving: identity issue

aaronraimist commented 3 years ago

To pass the federation tester you'll need to get your .well-known files setup on tomlawson.io and you'll type tomlawson.io into the federation tester rather than matrix.tomlawson.io. https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-well-known.md

tomlawesome commented 3 years ago

To pass the federation tester you'll need to get your .well-known files setup on tomlawson.io and you'll type tomlawson.io into the federation tester rather than matrix.tomlawson.io. https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-well-known.md

Ah. That'll be why then!

Do you know if the latter seciton of the server delegation docs will apply when behind a reverse proxy that handles all the TLS?

ensure that you are serving the Matrix Federation API (tcp/8448) with a certificate for <your-domain> (not matrix.<your-domain>!). Getting this certificate to the matrix.<your-domain> server may be complicated.

The playbook's automatic SSL obtaining/renewal flow will likely not work and you'll need to copy certificates around manually. See below.
aaronraimist commented 3 years ago

I would strongly recommend you ignore that whole "Server Delegation via a DNS SRV record (advanced)" section. It is possible to use SRV instead of .well-known but I would not recommend it.

tomlawesome commented 3 years ago

I would strongly recommend you ignore that whole "Server Delegation via a DNS SRV record (advanced)" section. It is possible to use SRV instead of .well-known but I would not recommend it.

Ok, it just seemed preferable as the other way isn't (unfortunately) easy for me either. the base domain points at wordpress hosting that's not on my own server, so I'm not sure how I'll get https://tomlawson.io/.wellknown/etc/ with the file(s) there. Will look into if I can redirect the /sub/path via Cloudflare to my own server and try to serve it from there.

tomlawesome commented 3 years ago

Ok, managed to get them on the root domain. Luckily the host does allow non-wordpress generated files. Had to manually set the variable below and re-run the installer to get it to generate the .well-known files. Maybe because I've got SSL turned off? But there was nothing in there before.

matrix_well_known_matrix_server_enabled: true

I copied the contents of both files to their respective places: https://tomlawson.io/.well-known/matrix/server https://tomlawson.io/.well-known/matrix/client

I still seem to be failing the matrix federation tester with:

Connection Errors
Get "https://104.26.2.228:8448/_matrix/key/v2/server": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Get "https://104.26.3.228:8448/_matrix/key/v2/server": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Get "https://172.67.73.71:8448/_matrix/key/v2/server": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Get "https://[2606:4700:20::681a:2e4]:8448/_matrix/key/v2/server": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Get "https://[2606:4700:20::681a:3e4]:8448/_matrix/key/v2/server": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Get "https://[2606:4700:20::ac43:4947]:8448/_matrix/key/v2/server": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

But https://matrix.tomlawson.io/_matrix/key/v2/server does output. Is this a potential issue with not having 8448 open directly on my firewall?

aaronraimist commented 3 years ago

The .well-known that the playbook generates assumes you are using 8448 but in this case you are using 443 for federation so you need to change the server one from

{
        "m.server": "matrix.tomlawson.io:8448"
}

to

{
        "m.server": "matrix.tomlawson.io:443"
}
tomlawesome commented 3 years ago

Doh! Makes sense. Unfortunately though it does still give me the same error?

aaronraimist commented 3 years ago

The federation tester seems not to like that it is behind Cloudflare

aaronraimist commented 3 years ago

Cloudflare blocks the user agent that the federation tester uses

tomlawesome commented 3 years ago

Cloudflare blocks the user agent that the federation tester uses

Interesting. Is this user agent specific to the tester? Am wondering if it's being picked up by the bot protection, and if it will also affect actual federation?

aaronraimist commented 3 years ago

Yeah I'm guessing Cloudflare is trying to reduce bots. The user agent it uses is the standard golang Go-http-client.

You may be able to federate with Synapse servers. Everything looks correct to me, it'll just depend on what Cloudflare blocks. Dendrite is also written in go, it shares some of the same libraries as the federation tester, so it is possible you won't be able to federate with Dendrite servers. There aren't a ton of Dendrite servers out there right now but more and more are popping up.

tomlawesome commented 3 years ago

Go-http-client

The amusing thing about this, is the federation tester is hosted / running DNS through Cloudflare too, lol.

Amusing because if I create an IP rule for user agent, I have to give cloudflare their own IP(s)

aaronraimist commented 3 years ago

😄 Yeah much of the matrix.org infrastructure is but my understanding is they have turned off a lot of those protection features and it is only turned on if there is an active attack

tomlawesome commented 3 years ago

smile Yeah much of the matrix.org infrastructure is but my understanding is they have turned off a lot of those protection features and it is only turned on if there is an active attack

Sadly adding the IPs as allowed doesn't seem to stop them blocking the user agent. Not really sure how to proceed at this point, as.. all the API endpoints are seemingly working, but element can't see the identity server so can't register/login.

I can't really shift away from Cloudflare really, i'm just too deep into using their setup for everything else. At least for the fed tester part.

aaronraimist commented 3 years ago

Hmm why can’t element find the identity server? You could just disable the identity server, it’s not super important.

tomlawesome commented 3 years ago

Hmm why can’t element find the identity server? You could just disable the identity server, it’s not super important.

I'm really not sure -- the well-known parts appear to be working. From what I understand all the elements are working? I have: https://tomlawson.io/.well-known/matrix/client https://tomlawson.io/.well-known/matrix/server https://matrix.tomlawson.io/_matrix/federation/v1/version https://matrix.tomlawson.io/_matrix/key/v2/server https://matrix.tomlawson.io/_matrix/client/versions

Identity DNS SRV record: image

Do you notice any missing end-points/parts that should be required? Running through the docs I can't see that anything's missing..

tomlawesome commented 3 years ago

Updating for those who may read this in future. There's two more end points you need to make available: https://matrix.tomlawson.io/_matrix/client/r0/login https://matrix.tomlawson.io/_matrix/identity/api/v1