matrix-org / sygnal

Sygnal: reference Push Gateway for Matrix
Apache License 2.0
155 stars 140 forks source link

Apple pushes don't recover after a proxy failure #304

Open MatMaul opened 2 years ago

MatMaul commented 2 years ago

We use sygnal with a HTTP proxy. Sometimes our HTTP proxy restarts and is unavailable for a short period of time, returning a 503 response.

In this case we can clearly see the related errors in the logs, for both APNS and GCM.

However APNS never recovers after that, and no errors are logged after. GCM manages to recover well.

Here is the logs around the time the proxy restarts.

sygnal_crash.txt

richvdh commented 2 years ago

@MatMaul can you paste some keywords (and the stacktrace?) from the log into the issue, so that they will come up with a keyword search?

MatMaul commented 2 years ago

Good idea. Here is the stacktrace related to the 503, for APNS side:

May 18 00:00:07 tchap-sygnal sygnal-0[24431]: sygnal.exceptions.ProxyConnectError: An error occurred while connecting: Unexpected status on CONNECT: b'503'.
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: 2022-05-18 00:00:07,703 [24431] twisted 147 - CRITICAL -
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: Traceback (most recent call last):
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/asyncioreactor.py", line 138, in _readOrWrite
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     why = method()
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/tcp.py", line 248, in doRead
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     return self._dataReceived(data)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/tcp.py", line 253, in _dataReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     rval = self.protocol.dataReceived(data)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/endpoints.py", line 147, in dataReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     return self._wrappedProtocol.dataReceived(data)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/helper/proxy/connectproxyclient_twisted.py", line 207, in dataReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     return self.http_setup_client.dataReceived(data)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/protocols/basic.py", line 549, in dataReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     why = self.lineReceived(line)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/web/http.py", line 635, in lineReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     self.handleStatus(version, status, message)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/helper/proxy/connectproxyclient_twisted.py", line 239, in handleStatus
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     raise ProxyConnectError("Unexpected status on CONNECT: %s" % status)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: sygnal.exceptions.ProxyConnectError: An error occurred while connecting: Unexpected status on CONNECT: b'503'.
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: 2022-05-18 00:00:07,704 [24431] sygnal.gcmpushkin 394 - WARNING - [a92c89b5-8ca7-4deb-b336-46c0022a4eb2] Temporary failure, will retry in 10 seconds
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: Traceback (most recent call last):
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/gcmpushkin.py", line 196, in _perform_http_request
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     bodyProducer=body_producer,
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/asyncioreactor.py", line 138, in _readOrWrite
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     why = method()
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/tcp.py", line 248, in doRead
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     return self._dataReceived(data)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/tcp.py", line 253, in _dataReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     rval = self.protocol.dataReceived(data)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/endpoints.py", line 147, in dataReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     return self._wrappedProtocol.dataReceived(data)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/helper/proxy/connectproxyclient_twisted.py", line 207, in dataReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     return self.http_setup_client.dataReceived(data)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/protocols/basic.py", line 549, in dataReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     why = self.lineReceived(line)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/web/http.py", line 635, in lineReceived
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     self.handleStatus(version, status, message)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/helper/proxy/connectproxyclient_twisted.py", line 239, in handleStatus
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     raise ProxyConnectError("Unexpected status on CONNECT: %s" % status)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: sygnal.exceptions.ProxyConnectError: An error occurred while connecting: Unexpected status on CONNECT: b'503'.
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: The above exception was the direct cause of the following exception:
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: Traceback (most recent call last):
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/gcmpushkin.py", line 379, in _dispatch_notification_unlimited
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     n, log, body, headers, pushkeys, span
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/gcmpushkin.py", line 220, in _request_dispatch
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     response, response_text = await self._perform_http_request(body, headers)
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/gcmpushkin.py", line 202, in _perform_http_request
May 18 00:00:07 tchap-sygnal sygnal-0[24431]:     ) from exception
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: sygnal.exceptions.TemporaryNotificationDispatchException: GCM request failure
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: 2022-05-18 00:00:07,704 [24431] twisted 147 - INFO - Stopping factory <sygnal.helper.proxy.connectproxyclient_twisted.HTTPProxiedClientFactory object at 0x7f680f5aa0b8>
May 18 00:00:07 tchap-sygnal sygnal-0[24431]: 2022-05-18 00:00:07,712 [24431] aioapns 356 - ERROR - Could not connect to server: An error occurred while connecting: Error from HTTP Proxy whilst attempting CONNECT: 503 (Service

And here is the one for GCM (that manages to recover afterwards):

May 17 23:59:46 tchap-sygnal sygnal-0[24431]: 2022-05-17 23:59:46,092 [24431] sygnal.gcmpushkin 394 - WARNING - [38369a3f-17b0-4851-998c-391805fadca9] Temporary failure, will retry in 10 seconds
May 17 23:59:46 tchap-sygnal sygnal-0[24431]: Traceback (most recent call last):
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/gcmpushkin.py", line 196, in _perform_http_request
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     bodyProducer=body_producer,
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/asyncioreactor.py", line 138, in _readOrWrite
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     why = method()
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/tcp.py", line 248, in doRead
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     return self._dataReceived(data)
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/tcp.py", line 253, in _dataReceived
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     rval = self.protocol.dataReceived(data)
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/internet/endpoints.py", line 147, in dataReceived
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     return self._wrappedProtocol.dataReceived(data)
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/helper/proxy/connectproxyclient_twisted.py", line 207, in dataReceived
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     return self.http_setup_client.dataReceived(data)
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/protocols/basic.py", line 549, in dataReceived
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     why = self.lineReceived(line)
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/twisted/web/http.py", line 635, in lineReceived
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     self.handleStatus(version, status, message)
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/helper/proxy/connectproxyclient_twisted.py", line 239, in handleStatus
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     raise ProxyConnectError("Unexpected status on CONNECT: %s" % status)
May 17 23:59:46 tchap-sygnal sygnal-0[24431]: sygnal.exceptions.ProxyConnectError: An error occurred while connecting: Unexpected status on CONNECT: b'503'.
May 17 23:59:46 tchap-sygnal sygnal-0[24431]: The above exception was the direct cause of the following exception:
May 17 23:59:46 tchap-sygnal sygnal-0[24431]: Traceback (most recent call last):
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/gcmpushkin.py", line 379, in _dispatch_notification_unlimited
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     n, log, body, headers, pushkeys, span
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/gcmpushkin.py", line 220, in _request_dispatch
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     response, response_text = await self._perform_http_request(body, headers)
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:   File "/opt/sygnal/env/lib/python3.7/site-packages/sygnal/gcmpushkin.py", line 202, in _perform_http_request
May 17 23:59:46 tchap-sygnal sygnal-0[24431]:     ) from exception
May 17 23:59:46 tchap-sygnal sygnal-0[24431]: sygnal.exceptions.TemporaryNotificationDispatchException: GCM request failure
MatMaul commented 2 years ago

Not happy news: while the proxy restarts every night (we can see the errors from my previous message in the logs every night), the bug is not triggered each time, meaning sometimes it manages to recover and keep sending APNS after the proxy is back. Probably meaning it's a quite hairy edge case to find.

DMRobertson commented 7 months ago

I have just noticed the following aioapns bug: https://github.com/Fatal1ty/aioapns/issues/42. It sounds very similar to the symptoms described here. That bug was fixed in aioapns v3---so I am very tempted to think that this might magically work by upgrading to the latest Sygnal release, which requires at least this version: https://github.com/matrix-org/sygnal/blob/5f55f4255b95bf7c8c467d873c37609edc980a3a/pyproject.toml#L77

If we can get some confirmation of this that would be ideal.