Closed danjenkins closed 4 years ago
Another loop's log
2020-04-28 12:45:10,451 - synapse.http.federation.matrix_federation_agent - 242 - INFO - federation_transaction_transmission_loop-478 - Connecting to chat.python.nz:8448
2020-04-28 12:45:10,457 - synapse.http.federation.matrix_federation_agent - 242 - INFO - federation_transaction_transmission_loop-895 - Connecting to matrix.familie-vogdt.de:443
2020-04-28 12:45:10,459 - synapse.http.federation.matrix_federation_agent - 242 - INFO - federation_transaction_transmission_loop-893 - Connecting to rrra.bid:8448
2020-04-28 12:45:10,460 - synapse.http.federation.matrix_federation_agent - 242 - INFO - federation_transaction_transmission_loop-761 - Connecting to matrix.nilux.be:8448
2020-04-28 12:45:10,492 - synapse.http.matrixfederationclient - 434 - INFO - federation_transaction_transmission_loop-568 - Failed to send request: <Message id=11566 rCode=2 maxSize=0 flags=answer,recDes,recAv queries=[Query(b'_matrix._tcp.synapse.pantarhei.duckdns.org', 33, 1)]>
2020-04-28 12:45:10,493 - synapse.http.matrixfederationclient - 491 - WARNING - federation_transaction_transmission_loop-568 - {PUT-O-545} [synapse.pantarhei.duckdns.org] Request failed: PUT matrix://synapse.pantarhei.duckdns.org/_matrix/federation/v1/send/1588077218366: DNSServerError(<Message id=11566 rCode=2 maxSize=0 flags=answer,recDes,recAv queries=[Query(b'_matrix._tcp.synapse.pantarhei.duckdns.org', 33, 1)]>)
2020-04-28 12:45:10,495 - synapse.http.matrixfederationclient - 434 - INFO - federation_transaction_transmission_loop-479 - Failed to send request: <Message id=39429 rCode=2 maxSize=0 flags=answer,recDes,recAv queries=[Query(b'_matrix._tcp.matrix.catila.duckdns.org', 33, 1)]>
2020-04-28 12:45:10,510 - synapse.http.matrixfederationclient - 491 - WARNING - federation_transaction_transmission_loop-479 - {PUT-O-490} [matrix.catila.duckdns.org] Request failed: PUT matrix://matrix.catila.duckdns.org/_matrix/federation/v1/send/1588077218311: DNSServerError(<Message id=39429 rCode=2 maxSize=0 flags=answer,recDes,recAv queries=[Query(b'_matrix._tcp.matrix.catila.duckdns.org', 33, 1)]>)
2020-04-28 12:45:10,510 - twisted - 192 - INFO - - Main loop terminated.
Humph doesnt look like an issue with that version, rolled back to a previous version
Hmm, not sure what's going on but it's probably not the federation spam. That happens whenever you're in a room with users from a homeserver that no longer exists (someone shut down their homeserver without leaving all public rooms first).
Can you kubectl describe
the pod and see why it's restarting? Did the process crash or did the health check fail?
Does look like the liveness probe failed!
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 100s (x38 over 7h51m) kubelet, pool-8a97vo5tx-3nbno Container synapse failed liveness probe, will be restarted
Warning Unhealthy 100s (x123 over 7h51m) kubelet, pool-8a97vo5tx-3nbno Liveness probe failed: Get http://10.244.0.161:8008/_matrix/static/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 99s (x128 over 7h51m) kubelet, pool-8a97vo5tx-3nbno Readiness probe failed: Get http://10.244.0.161:8008/_matrix/static/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Normal Started 94s (x39 over 7h54m) kubelet, pool-8a97vo5tx-3nbno Started container synapse
Normal Pulled 94s (x39 over 7h54m) kubelet, pool-8a97vo5tx-3nbno Container image "matrixdotorg/synapse:v1.10.0" already present on machine
Normal Created 94s (x39 over 7h54m) kubelet, pool-8a97vo5tx-3nbno Created container synapse
Warning Unhealthy 91s (x24 over 7h54m) kubelet, pool-8a97vo5tx-3nbno Liveness probe failed: Get http://10.244.0.161:8008/_matrix/static/: dial tcp 10.244.0.161:8008: connect: connection refused
Warning Unhealthy 90s (x47 over 7h54m) kubelet, pool-8a97vo5tx-3nbno Readiness probe failed: Get http://10.244.0.161:8008/_matrix/static/: dial tcp 10.244.0.161:8008: connect: connection refused
Port: 8008/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 28 Apr 2020 21:50:56 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 28 Apr 2020 21:47:55 +0100
Finished: Tue, 28 Apr 2020 21:50:56 +0100
Ready: True
Restart Count: 38
Liveness: http-get http://:http/_matrix/static/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/_matrix/static/ delay=0s timeout=1s period=10s #success=1 #failure=3
Thanks, that's good to know. Let's try this:
kubectl edit
the deployment and remove the liveness/readiness probes for a momentkubectl port-forward matrix-synapse-blahblahblah 8008
If you can hit the endpoints manually, leave the probes disabled and see if you can use the homeserver as normal. We can keep looking into why the probes are failing but that should give you a temporary fix.
If you can't reach the endpoints or the homeserver still doesn't work with the probes disabled, then we can take a closer look at your logs and see if it's an issue with the chart or with Synapse.
Thanks!
thanks @dacruz21 ! I need to up the docker image again so I'll just comment it out in the deployment file for now (still using my edited fork at the moment)
So currently I can hit it. Will report back behaviours from tomorrow !
So it seems the container hasnt restarted itself, and I seem to be able to chat to people fine for a long period of time....
Hey @danjenkins, I've bumped the default timeout for the Synapse health probes. Can you update to 2.1.0 and let me know if you're still getting this issue? Feel free to raise/lower them further in synapse.probes
, 5 seconds is probably more generous than you want to be, but if you're on some old hardware, then maybe it's not generous enough :)
@dacruz21 Circling back on the timeout issue, 5 seconds seems to not be generous enough. It's much better but I am averaging 5 restarts in 24 hours due to healthcheck failure.
I upped the logging to INFO and it looks like it restarts during a federation loop
Anyone else had problems with 1.12.4 before I look into it further and submit an upstream issue?