matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.82k stars 2.13k forks source link

Not whitelisting federation with self breaks federation #4857

Closed spantaleev closed 2 years ago

spantaleev commented 5 years ago

Description

This is somewhat of a continuation of #4856.

I've got 2 servers with federation enabled. Each server only whitelists the other in federation_domain_whitelist. I'm only expecting them to federate with one another.

When I make a room and invite a user from the other server (inviting by matrix id), Synapse would first try retrieving some signature keys.

As described in #4856, it first attempts to do so using perspectives. This will fail if federation with matrix.org is not enabled.

The server then attempts to federate with itself for some reason. If our own domain is not listed in federation_domain_whitelist, we'll get a FederationDeniedError error and federation will effectively not work. The invite reaches the other server, but in a broken state - it can neither be accepted, nor rejected. The only solution to fix that up on the other server - deleting it from local_invites and restarting Synapse (maybe this is some other bug that should be reported and worked on?).

I'm not sure what the reason is for needing to federate with self. If there is a valid reason for doing so, perhaps:

Version information

richvdh commented 5 years ago

If you've got logs from both sides demonstrating the problem, that would help.

spantaleev commented 5 years ago

I've spent some time reproducing this.

Server setup

I've set up two servers (one.matrix.mydomain.com and two.matrix.mydomain.com).

Synapse's server_name matches the server's hostname. Both servers have valid TLS certificates via Let's Encrypt.

Each server hosts a /.well-known/matrix/server file, explicitly pointing federation to port 8448 (e.g. https://one.matrix.mydomain.com/.well-known/matrix/server with content like this: {"m.server": "one.matrix.mydomain.com:8448"}).

I'm creating a single user (called slavi) on each of these servers.

Server installation

I'm setting up the servers using matrix-docker-ansible-deploy, with configuration like this:

inventory/host_vars/one.matrix.mydomain.com/vars.yml:

matrix_domain: "one.matrix.mydomain.com"
matrix_server_fqn_matrix: "{{ matrix_domain }}"

matrix_riot_web_enabled: false
matrix_coturn_enabled: false
matrix_mailer_enabled: false
matrix_mxisd_enabled: false

matrix_ssl_lets_encrypt_support_email: MY_EMAIL_ADDRESS_HERE

matrix_synapse_macaroon_secret_key: "something"

matrix_synapse_federation_domain_whitelist:
  - 'two.matrix.mydomain.com'

inventory/host_vars/two.matrix.mydomain.com/vars.yml:

matrix_domain: "two.matrix.mydomain.com"
matrix_server_fqn_matrix: "{{ matrix_domain }}"

matrix_riot_web_enabled: false
matrix_coturn_enabled: false
matrix_mailer_enabled: false
matrix_mxisd_enabled: false

matrix_ssl_lets_encrypt_support_email: MY_EMAIL_ADDRESS_HERE

matrix_synapse_macaroon_secret_key: "something"

matrix_synapse_federation_domain_whitelist:
  - 'one.matrix.mydomain.com'

The server set up is done with ansible-playbook -i inventory/hosts setup.yml --tags=setup-all,start.

The resulting homeserver.yaml is something like this (example from one.matrix.mydomain.com):

....
federation_domain_whitelist: ["two.matrix.mydomain.com"]
...

The user is created on each server using: ansible-playbook -i inventory/hosts setup.yml --extra-vars='username=slavi password=some_password admin=yes' --tags=register-user

Testing

I'm then logging in via riot-web (https://riot.im/app/) to each servers.

From one.matrix.mydomain.com, I'm starting a direct chat with @slavi:two.matrix.mydomain.com. An error like this is shown:

Failure to create room

Server may be unavailable, overloaded, or you hit a bug.

Logs

I'm attaching the full logs from both servers:

Related problems

The following problems are observed on the remote / 2nd server (two.matrix.mydomain.com). I've done these actions after downloading the logs, so logging information is not available.

On the 2nd server (two.matrix.mydomain.com) I can see the invitation in the "Invites" section in riot-web.

Clicking Accept says:

Failed to join room

You are not invited to this room.

Clicking Decline changes the middle section of riot-web to white. Looking at the network tab, I can see that this /leave request is responded to with a 200 status and a {} payload.

However, the following error is generated on the server:

2019-03-18 08:42:00,234 - synapse.http.matrixfederationclient - 304 - INFO - POST-171 - {GET-O-9} [one.matrix.mydomain.com] Sending request: GET matrix://one.matrix.mydomain.com/_matrix/federation/v1/make_leave/%21EBhHjkpSWmcYhcwWLq%3Aone.matrix.mydomain.com/%40slavi%3Atwo.matrix.mydomain.com; timeout 20.000000s
2019-03-18 08:42:00,241 - synapse.metrics - 372 - INFO -  - Collecting gc 0
2019-03-18 08:42:00,289 - synapse.http.matrixfederationclient - 336 - INFO - POST-171 - {GET-O-9} [one.matrix.mydomain.com] Got response headers: 403 Forbidden
2019-03-18 08:42:00,290 - synapse.http.matrixfederationclient - 420 - WARNING - POST-171 - {GET-O-9} [one.matrix.mydomain.com] Request failed: GET matrix://one.matrix.mydomain.com/_matrix/federation/v1/make_leave/%21EBhHjkpSWmcYhcwWLq%3Aone.matrix.mydomain.com/%40slavi%3Atwo.matrix.mydomain.com: HttpResponseException("403: b'Forbidden'",)
2019-03-18 08:42:00,292 - synapse.handlers.room_member - 1022 - WARNING - POST-171 - Failed to reject invite: 403: @slavi:two.matrix.mydomain.com not in room !EBhHjkpSWmcYhcwWLq:one.matrix.mydomain.com.

As a result, the invite lingers on forever (until deleting it manually from the local_invites table and restarting Synapse).

Totaly-Crazy commented 5 years ago

Can say that I have the same problem. The setup is a bit different. But logs and results are almost the same. The only difference is that it doesn't try to connect to matrix.org

This is intended as a standalone chat setup without any federation to other servers.

Clients connect via a reverse proxy. Federation is directly connected.

IP addesses and URL's etc are changed.

Setup: Both servers are in the same subnet, no firewalls etc involved. Certs from Comodo. Internal DNS server. .well_known files are in place.

Server1: server1.subdomain.domain.net, 192.168.100.11 Server2: server2.subdomain.domain.net, 192.168.100.12

federation_domain_whitelist: is set to allow the other server.

federation and client ports are separated. TLS is active.

No trusted id servers.

perspectives: is also configured for the other server.

Can provide logs etc if needed.

cavabanga26 commented 5 years ago

Did you manage to solve the problem?

richvdh commented 5 years ago

I think this would be solved by fixing #4024

Feliix42 commented 4 years ago

For the past few days, I've been trying to set up federation between two synapse instances and failed spectacularly due to weird bugs where I could join rooms on the respectively other server via room directory but was unable to invite users from the other server to a room.

This would fail with an opaque riot-web error (which does not matter for this issue) but send a broken invitation to the other home server anyway. This invitation can neither be accepted nor declined. Accepting instead resulted in riot telling me that I "have not been invited to this room", which was under the hood triggered by the original HS returning a 403 error to the joining user's make_join request.

This was solved by whitelisting federation with self for both home servers (and resending the invitations), something that I only tried because I found this issue after hours of debugging.

So for the time being could we maybe add a line or two to the configuration file or the docs regarding this behavior so future users will not run into this while this issue is still open?

richvdh commented 4 years ago

So for the time being could we maybe add a line or two to the configuration file or the docs regarding this behavior so future users will not run into this while this issue is still open?

sure, a PR would be welcome.

MadLittleMods commented 2 years ago

I think this would be solved by fixing #4024

4024 was solved by https://github.com/matrix-org/synapse/pull/11129

I think we can close this now