spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.83k stars 1.04k forks source link

Weird problems with bots/bridges and federation #1861

Open ethindp opened 2 years ago

ethindp commented 2 years ago

So I recently switched my server to AlmaLinux and am running into some very peculiar problems. For one, I can't seem to join certain rooms (e.g. the mautrix-facebook group). For two, my bridges are failing to resolve hostnames, e.g. the whatsapp bridge can't resolve web.whatsapp.com, and the Facebook bridge can't connect to b-graph.facebook.com. When I tried joining the mautrix-facebook group I got different errors: "no known servers", "failed to get room alias", "too many requests", etc. I have absolutely no idea what's going wrong. I assume that the playbook configured the firewall via firewalld (and not, say, iptables), but I've opened the ports on firewalld anyway. My default zone is "drop" to comply with CIS but everything should be set up correctly. Did I mess something up?

ethindp commented 2 years ago

What's even stranger is that my existing federations to matrix.org are working fine. But for some reason, bridges are having weird problems that make absolutely zero sense.

spantaleev commented 2 years ago

Which AlmaLinux version? 8 or 9?

Perhaps some of https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/300 (about CentOS 8) would be helpful.

The playbook does not configure the firewall at all. It relies on Docker automatically configuring iptables. Perhaps Docker cannot do it properly in your case because firewalld is getting in the way somehow. See our issue about CentOS 8 and let us know if it helps you somehow.

My default zone is "drop" to comply with CIS

CIS? Perhaps changing the default means Docker's expectations about firewall setup failed. Try restoring to the default and see if it helps.

ethindp commented 2 years ago

@spantaleev So I believe I'm now just running into errors that are trivially resolvable. One question, though: during my latest CIS audit, a couple auditing rules failed:

Normally to remediate this problem I'd run something like:

  1. To fix sticky bits: df --local -P | awk '{if (NR!=1) print $6}' \ | xargs -I '{}' find '{}' -xdev -type d \ \( -perm -0002 -a ! -perm -1000 \) 2>/dev/null \ | xargs chmod a+t
  2. To ensure no world-writable files exist: find / -xdev -type f -perm -002 -exec chmod o-w {} \;
  3. To find all files that are not owned by a group and user:
    1. For groups: df --local -P | awk '{if (NR!=1) print $6}' | sudo xargs -I '{}' find '{}' -xdev -nogroup
    2. For users: df --local -P | awk {'if (NR!=1) print $6'} | sudo xargs -I '{}' find '{}' -xdev -nouser

The problem with me running these commands for remediation is that the vast majority of files that openscap found during its initial assessment are created and managed by docker (e.g. /var/lib/docker/overlay2/14ed996bfcc9b4f9b19045a6fca618cc1e9fffa134237d8b4dc7193251e45c26/diff/run/postgresql/). So remediating some of this, e.g., not owned by a group or user, would be time-consuming. But I'm also wondering if it would even be worth it. Would it even work? Would it break docker in some way? Or the containers? If I shouldn't remediate these or if I should just leave them alone, I can try to tinker with the rules to tell openscap and lynis to ignore anything under /var/lib/docker. What should I do here?

ethindp commented 2 years ago

@spantaleev Okay so I'm actually hitting a rather odd error (I've even enabled IP forwarding). For some reason, hydrogen is failing to build:

fatal: [matrix.the-gdn.net]: FAILED! => changed=false 
  msg: 'Error building localhost/vectorim/hydrogen-web - code: 1, message: The command ''/bin/sh -c yarn install  && yarn build'' returned a non-zero code: 1, logs: [''Step 1/7 : FROM docker.io/node:alpine as builder'', ''\n'', '' ---> 9f
58095cfeb6\n'', ''Step 2/7 : RUN apk add --no-cache git python3 build-base'', ''\n'', '' ---> Using cache\n'', '' ---> 5242190ebf1a\n'', ''Step 3/7 : COPY . /app'', ''\n'', '' ---> 8541009e5ff5\n'', ''Step 4/7 : WORKDIR /app'', ''\n'', ''
 ---> Running in 8c2740c82d65\n'', ''Removing intermediate container 8c2740c82d65\n'', '' ---> 7dda814a0189\n'', ''Step 5/7 : RUN yarn install  && yarn build'', ''\n'', '' ---> Running in 524af1af683d\n'', ''yarn install v1.22.19\n'', ''[
1/4] Resolving packages...\n'', ''[2/4] Fetching packages...\n'', ''\x1b[91merror An unexpected error occurred: "https://registry.yarnpkg.com/another-json/-/another-json-0.2.0.tgz: getaddrinfo EAI_AGAIN registry.yarnpkg.com".\n\x1b[0m'', 
''info If you think this is a bug, please open a bug
    report with the information provided in "/app/yarn-error.log".\n'', ''info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.\n'', ''Removing intermediate container 524af1af683d\n'']'

Definitely unsure about this one; at first I thought it was IP forwarding that was causing the problem, but its not. And I didn't get this error when I migrated the system initially either, which was odd.