matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.82k stars 2.13k forks source link

Federation tester says OK, yet cannot federate (403: Forbidden) #16571

Open kurkii opened 1 year ago

kurkii commented 1 year ago

Description

After migrating postgres database from the server to the same server, synapse cannot federate. The homeserver cannot sync any messages

Postgres logs show:

LOG: couldnt recieve data from client: Connection reset by peer
LOG: unexpected EOF on client connection with an open transaction

NGINX logs show:

[error]: *26081 no live upstreams while connecting to upstream, client: x.x.x.x; server: matrix.domain.net

Steps to reproduce

Homeserver

my homeserver

Synapse Version

1.95.0

Installation Method

Docker (matrixdotorg/synapse)

Database

Postgres version 15.0, one single server

Workers

Single process

Platform

Debian 12 on a VPS VM, AMD EPYC CPU with 4GB of memory. the package is installed through apt

Configuration

I have implemented sliding sync

Relevant log output

2023-10-29 10:53:00,377 - synapse.http.matrixfederationclient - 718 - INFO - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4-$KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak - {GET-O-3} [libera.chat] Got response headers: 403 Forbidden
2023-10-29 10:53:00,377 - synapse.http.matrixfederationclient - 808 - WARNING - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4-$KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak - {GET-O-3} [libera.chat] Request failed: GET matrix-federation://libera.chat/_matrix/federation/v1/state_ids/%21ugBhqCIFYAcsUTyxIW%3Alibera.chat?event_id=%24KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak: HttpResponseException('403: Forbidden')
2023-10-29 10:53:00,378 - synapse.handlers.federation_event - 1176 - WARNING - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4 - Error attempting to resolve state at missing prev_events: 403: Forbidden

Anything else that would be useful to know?

using NGINX as reverse proxy

DMRobertson commented 1 year ago

Have you changed your signing keys or domain name as part of your migration?

kurkii commented 1 year ago

Have you changed your signing keys or domain name as part of your migration?

thanks for the reply, no i havent changed my domain or anything with homeserver.yaml or my nginx configs. the signing key is the same as well. also, the installation method in my post is incorrect, its installed through the package manager, not with Docker

kurkii commented 1 year ago

i left synapse running overnight, now i noticed that sometime in the middle our hs users could send messages and sync

it seems that randomly it started processing requests:

2023-10-30 20:11:41,756 - synapse.util.caches.response_cache - 268 - INFO - PUT-75247 - [fed_txn_handler]: using incomplete cached result for [('t2bot.io', '1697929611944')]
2023-10-30 20:11:41,765 - synapse.federation.transport.server.federation - 105 - INFO - PUT-75197 - Received txn 1698658176478 from mozilla.org. (PDUs: 1, EDUs: 0)

but then in the morning it stopped working again: 2023-10-31 06:05:05,648 - twisted - 275 - INFO - sentinel - Received SIGTERM, shutting down. seems like the OS killed it? postgres also worked fine, then started complaining of those unexpected EOFs and got killed as well.

and after that it keeps giving the same 403/401 forbidden errors.

DMRobertson commented 1 year ago

but then in the morning it stopped working again: 2023-10-31 06:05:05,648 - twisted - 275 - INFO - sentinel - Received SIGTERM, shutting down. seems like the OS killed it?

Agreed. I would guess something is using too much memory and the OOMKiller kicked in.

Relevant log output

2023-10-29 10:53:00,377 - synapse.http.matrixfederationclient - 718 - INFO - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4-$KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak - {GET-O-3} [libera.chat] Got response headers: 403 Forbidden
2023-10-29 10:53:00,377 - synapse.http.matrixfederationclient - 808 - WARNING - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4-$KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak - {GET-O-3} [libera.chat] Request failed: GET matrix-federation://libera.chat/_matrix/federation/v1/state_ids/%21ugBhqCIFYAcsUTyxIW%3Alibera.chat?event_id=%24KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak: HttpResponseException('403: Forbidden')
2023-10-29 10:53:00,378 - synapse.handlers.federation_event - 1176 - WARNING - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4 - Error attempting to resolve state at missing prev_events: 403: Forbidden

Are there examples of this for other room IDs and other remote servers, or is the problem bespoke to this room?

kurkii commented 1 year ago

Agreed. I would guess something is using too much memory and the OOMKiller kicked in.

its weird, ive never had issues with memory. mind you this is the same server, same specs. so its weird to me that this is happening after messing around with postgresql

as for memory, i didnt do much indepth logging but there seems to be atleast 1gb free memory. the thing that i noticed though is that postgres uses a whole lot of CPU, which it didnt before. its basically ramping my cpu to 100% usage. now i googled around and could the usage maybe be due to synapse failing and perhaps constantly reading from it? just a theory

Are there examples of this for other room IDs and other remote servers, or is the problem bespoke to this room?

this happens to all rooms and servers, theres many more examples, all with the same old "Forbidden" error:

2023-10-30 06:38:08,403 - synapse.http.matrixfederationclient - 718 - INFO - sync_partial_state_room-3-$UWLr8kWnHMdq8O2nF1JDTI9pHXr9QCJ-yymms2pjvEQ-$6xZq0R_aolOlK5VZ_bbxz55J2gQ_ciTSQpBpB72jR-g - {GET-O-391} [cat.casa] Got response headers: 403 Forbidden
2023-10-30 06:38:08,404 - synapse.http.matrixfederationclient - 808 - WARNING - sync_partial_state_room-3-$UWLr8kWnHMdq8O2nF1JDTI9pHXr9QCJ-yymms2pjvEQ-$6xZq0R_aolOlK5VZ_bbxz55J2gQ_ciTSQpBpB72jR-g - {GET-O-391} [cat.casa] Request failed: GET matrix-federation://cat.casa/_matrix/federation/v1/state_ids/%21kAPxkFvRAkeHUWjbgw%3Amatrix.org?event_id=%246xZq0R_aolOlK5VZ_bbxz55J2gQ_ciTSQpBpB72jR-g: HttpResponseException('403: Forbidden')
2023-10-30 06:38:08,404 - synapse.handlers.federation_event - 1176 - WARNING - sync_partial_state_room-3-$UWLr8kWnHMdq8O2nF1JDTI9pHXr9QCJ-yymms2pjvEQ - Error attempting to resolve state at missing prev_events: 403: Forbidden
2023-10-30 06:38:08,404 - synapse.handlers.federation - 1997 - INFO - sync_partial_state_room-3 - Failed to get state for !kAPxkFvRAkeHUWjbgw:matrix.org at <FrozenEventV3 event_id=$UWLr8kWnHMdq8O2nF1JDTI9pHXr9QCJ-yymms2pjvEQ, type=m.room.member, state_key=@user:myhs.net outlier=False> from cat.casa because ERROR 403: We can't get valid state history.

in the midst of all these i also noticed these dns errors

2023-10-30 08:48:44,772 - synapse.federation.sender.per_destination_queue - 411 - WARNING - federation_transaction_transmission_loop-1612 - TX [matrix.fruits.pw] Failed to send transaction: Failed to send request: TimeoutError: User timeout caused connection failure.
2023-10-30 08:48:44,805 - synapse.http.site - 377 - INFO - PUT-9537 - Connection from client lost before response was sent
2023-10-30 08:48:44,963 - synapse.http.federation.matrix_federation_agent - 365 - INFO - federation_transaction_transmission_loop-2111 - Failed to connect to waifuhunter.club:8448: User timeout caused connection failure.
2023-10-30 08:48:44,963 - synapse.http.matrixfederationclient - 761 - INFO - federation_transaction_transmission_loop-2111 - {PUT-O-2665} [waifuhunter.club] Request failed: PUT matrix-federation://waifuhunter.club/_matrix/federation/v1/send/1698646695626: TimeoutError('')
2023-10-30 08:48:45,083 - synapse.http.client - 194 - INFO - sentinel - Blocked 100.73.115.94 from DNS resolution to sintunnel.xyz
2023-10-30 08:48:45,083 - synapse.http.federation.matrix_federation_agent - 365 - INFO - federation_transaction_transmission_loop-630 - Failed to connect to sintunnel.xyz:8448: DNS lookup failed: no results for hostname lookup: sintunnel.xyz.
2023-10-30 08:48:45,083 - synapse.http.matrixfederationclient - 761 - INFO - federation_transaction_transmission_loop-630 - {PUT-O-2935} [sintunnel.xyz] Request failed: PUT matrix-federation://sintunnel.xyz/_matrix/federation/v1/send/1698646695892: DNSLookupError('no results for hostname lookup: sintunnel.xyz')
2023-10-30 09:00:16,962 - synapse.http.federation.matrix_federation_agent - 365 - INFO - federation_transaction_transmission_loop-676 - Failed to connect to einsaman.no:8448: DNS lookup failed: no results for hostname lookup: einsaman.no.
2023-10-30 09:00:16,962 - synapse.http.matrixfederationclient - 761 - INFO - federation_transaction_transmission_loop-676 - {PUT-O-3031} [einsaman.no] Request failed: PUT matrix-federation://einsaman.no/_matrix/federation/v1/send/1698646695976: DNSLookupError('no results for hostname lookup: einsaman.no')
2023-10-30 09:00:16,963 - synapse.util.retryutils - 276 - INFO - federation_transaction_transmission_loop-676 - Connection to einsaman.no was unsuccessful (<class 'synapse.api.errors.RequestSendFailed'>(Failed to send request: DNSLookupError: DNS lookup failed: no results for hostname lookup: einsaman.no.)); backoff now 604800000