Open kurkii opened 1 year ago
Have you changed your signing keys or domain name as part of your migration?
Have you changed your signing keys or domain name as part of your migration?
thanks for the reply, no i havent changed my domain or anything with homeserver.yaml or my nginx configs. the signing key is the same as well. also, the installation method in my post is incorrect, its installed through the package manager, not with Docker
i left synapse running overnight, now i noticed that sometime in the middle our hs users could send messages and sync
it seems that randomly it started processing requests:
2023-10-30 20:11:41,756 - synapse.util.caches.response_cache - 268 - INFO - PUT-75247 - [fed_txn_handler]: using incomplete cached result for [('t2bot.io', '1697929611944')]
2023-10-30 20:11:41,765 - synapse.federation.transport.server.federation - 105 - INFO - PUT-75197 - Received txn 1698658176478 from mozilla.org. (PDUs: 1, EDUs: 0)
but then in the morning it stopped working again:
2023-10-31 06:05:05,648 - twisted - 275 - INFO - sentinel - Received SIGTERM, shutting down.
seems like the OS killed it?
postgres also worked fine, then started complaining of those unexpected EOF
s and got killed as well.
and after that it keeps giving the same 403/401 forbidden errors.
but then in the morning it stopped working again:
2023-10-31 06:05:05,648 - twisted - 275 - INFO - sentinel - Received SIGTERM, shutting down.
seems like the OS killed it?
Agreed. I would guess something is using too much memory and the OOMKiller kicked in.
Relevant log output
2023-10-29 10:53:00,377 - synapse.http.matrixfederationclient - 718 - INFO - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4-$KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak - {GET-O-3} [libera.chat] Got response headers: 403 Forbidden 2023-10-29 10:53:00,377 - synapse.http.matrixfederationclient - 808 - WARNING - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4-$KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak - {GET-O-3} [libera.chat] Request failed: GET matrix-federation://libera.chat/_matrix/federation/v1/state_ids/%21ugBhqCIFYAcsUTyxIW%3Alibera.chat?event_id=%24KeGLFuN9E72fgishOMUsU6rh9J3wAbaTmfWIDAxqcak: HttpResponseException('403: Forbidden') 2023-10-29 10:53:00,378 - synapse.handlers.federation_event - 1176 - WARNING - sync_partial_state_room-2-$Rx-JViAFTq10Y9nO6Hv17DWglGcfL2Qk__FTWmJ7Wb4 - Error attempting to resolve state at missing prev_events: 403: Forbidden
Are there examples of this for other room IDs and other remote servers, or is the problem bespoke to this room?
Agreed. I would guess something is using too much memory and the OOMKiller kicked in.
its weird, ive never had issues with memory. mind you this is the same server, same specs. so its weird to me that this is happening after messing around with postgresql
as for memory, i didnt do much indepth logging but there seems to be atleast 1gb free memory. the thing that i noticed though is that postgres uses a whole lot of CPU, which it didnt before. its basically ramping my cpu to 100% usage. now i googled around and could the usage maybe be due to synapse failing and perhaps constantly reading from it? just a theory
Are there examples of this for other room IDs and other remote servers, or is the problem bespoke to this room?
this happens to all rooms and servers, theres many more examples, all with the same old "Forbidden" error:
2023-10-30 06:38:08,403 - synapse.http.matrixfederationclient - 718 - INFO - sync_partial_state_room-3-$UWLr8kWnHMdq8O2nF1JDTI9pHXr9QCJ-yymms2pjvEQ-$6xZq0R_aolOlK5VZ_bbxz55J2gQ_ciTSQpBpB72jR-g - {GET-O-391} [cat.casa] Got response headers: 403 Forbidden
2023-10-30 06:38:08,404 - synapse.http.matrixfederationclient - 808 - WARNING - sync_partial_state_room-3-$UWLr8kWnHMdq8O2nF1JDTI9pHXr9QCJ-yymms2pjvEQ-$6xZq0R_aolOlK5VZ_bbxz55J2gQ_ciTSQpBpB72jR-g - {GET-O-391} [cat.casa] Request failed: GET matrix-federation://cat.casa/_matrix/federation/v1/state_ids/%21kAPxkFvRAkeHUWjbgw%3Amatrix.org?event_id=%246xZq0R_aolOlK5VZ_bbxz55J2gQ_ciTSQpBpB72jR-g: HttpResponseException('403: Forbidden')
2023-10-30 06:38:08,404 - synapse.handlers.federation_event - 1176 - WARNING - sync_partial_state_room-3-$UWLr8kWnHMdq8O2nF1JDTI9pHXr9QCJ-yymms2pjvEQ - Error attempting to resolve state at missing prev_events: 403: Forbidden
2023-10-30 06:38:08,404 - synapse.handlers.federation - 1997 - INFO - sync_partial_state_room-3 - Failed to get state for !kAPxkFvRAkeHUWjbgw:matrix.org at <FrozenEventV3 event_id=$UWLr8kWnHMdq8O2nF1JDTI9pHXr9QCJ-yymms2pjvEQ, type=m.room.member, state_key=@user:myhs.net outlier=False> from cat.casa because ERROR 403: We can't get valid state history.
in the midst of all these i also noticed these dns errors
2023-10-30 08:48:44,772 - synapse.federation.sender.per_destination_queue - 411 - WARNING - federation_transaction_transmission_loop-1612 - TX [matrix.fruits.pw] Failed to send transaction: Failed to send request: TimeoutError: User timeout caused connection failure.
2023-10-30 08:48:44,805 - synapse.http.site - 377 - INFO - PUT-9537 - Connection from client lost before response was sent
2023-10-30 08:48:44,963 - synapse.http.federation.matrix_federation_agent - 365 - INFO - federation_transaction_transmission_loop-2111 - Failed to connect to waifuhunter.club:8448: User timeout caused connection failure.
2023-10-30 08:48:44,963 - synapse.http.matrixfederationclient - 761 - INFO - federation_transaction_transmission_loop-2111 - {PUT-O-2665} [waifuhunter.club] Request failed: PUT matrix-federation://waifuhunter.club/_matrix/federation/v1/send/1698646695626: TimeoutError('')
2023-10-30 08:48:45,083 - synapse.http.client - 194 - INFO - sentinel - Blocked 100.73.115.94 from DNS resolution to sintunnel.xyz
2023-10-30 08:48:45,083 - synapse.http.federation.matrix_federation_agent - 365 - INFO - federation_transaction_transmission_loop-630 - Failed to connect to sintunnel.xyz:8448: DNS lookup failed: no results for hostname lookup: sintunnel.xyz.
2023-10-30 08:48:45,083 - synapse.http.matrixfederationclient - 761 - INFO - federation_transaction_transmission_loop-630 - {PUT-O-2935} [sintunnel.xyz] Request failed: PUT matrix-federation://sintunnel.xyz/_matrix/federation/v1/send/1698646695892: DNSLookupError('no results for hostname lookup: sintunnel.xyz')
2023-10-30 09:00:16,962 - synapse.http.federation.matrix_federation_agent - 365 - INFO - federation_transaction_transmission_loop-676 - Failed to connect to einsaman.no:8448: DNS lookup failed: no results for hostname lookup: einsaman.no.
2023-10-30 09:00:16,962 - synapse.http.matrixfederationclient - 761 - INFO - federation_transaction_transmission_loop-676 - {PUT-O-3031} [einsaman.no] Request failed: PUT matrix-federation://einsaman.no/_matrix/federation/v1/send/1698646695976: DNSLookupError('no results for hostname lookup: einsaman.no')
2023-10-30 09:00:16,963 - synapse.util.retryutils - 276 - INFO - federation_transaction_transmission_loop-676 - Connection to einsaman.no was unsuccessful (<class 'synapse.api.errors.RequestSendFailed'>(Failed to send request: DNSLookupError: DNS lookup failed: no results for hostname lookup: einsaman.no.)); backoff now 604800000
Description
After migrating postgres database from the server to the same server, synapse cannot federate. The homeserver cannot sync any messages
Postgres logs show:
NGINX logs show:
Steps to reproduce
Homeserver
my homeserver
Synapse Version
1.95.0
Installation Method
Docker (matrixdotorg/synapse)
Database
Postgres version 15.0, one single server
Workers
Single process
Platform
Debian 12 on a VPS VM, AMD EPYC CPU with 4GB of memory. the package is installed through apt
Configuration
I have implemented sliding sync
Relevant log output
Anything else that would be useful to know?
using NGINX as reverse proxy