Closed aaugustin closed 5 years ago
Like other situations discussed in #579, you lose the TCP connection.
transfer_data_task
enters this exception handling block:
except (ConnectionError, EOFError) as exc:
# Reading data with self.reader.readexactly may raise:
# - most subclasses of ConnectionError if the TCP connection
# breaks, is reset, or is aborted;
# - IncompleteReadError, a subclass of EOFError, if fewer
# bytes are available than requested.
self.transfer_data_exc = exc
self.fail_connection(1006)
which records the IncompleteReadError
in transfer_data_exc
.
Then, somewhere, a future completes with a ConnectionClosed
exception generated by this method:
def connection_closed_exc(self) -> ConnectionClosed:
exception: ConnectionClosed
if self.close_code == 1000 or self.close_code == 1001:
exception = ConnectionClosedOK(self.close_code, self.close_reason)
else:
exception = ConnectionClosedError(self.close_code, self.close_reason)
# Chain to the exception that terminated data transfer, if any.
exception.__cause__ = self.transfer_data_exc
return exception
This gives a logical exception report (the poor wording of ConnectionClosed notwithstanding):
asyncio.streams.IncompleteReadError: 0 bytes read on a total of 2 expected bytes
The above exception was the direct cause of the following exception:
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason
This leaves the question of which Future
completes with this exception and is never retrieved.
I suspect the keepalive ping:
def abort_keepalive_pings(self) -> None:
"""
Raise ConnectionClosed in pending keepalive pings.
They'll never receive a pong once the connection is closed.
"""
assert self.state is State.CLOSED
exc = self.connection_closed_exc()
for ping in self.pings.values():
ping.set_exception(exc) # <-- looks like our culprit
Enabling the debug mode of asyncio would tell us if this is indeed where the Future gets created, but I understand you can't do that in production due to the overhead.
If this is indeed the issue, it should be easy to reproduce by:
This leaves the question of how to deal with pending pings when the connection drops...
@lgrahl If you're able to patch your production with this line:
https://github.com/aaugustin/websockets/pull/638/files#diff-5e2d99fc26f0a3387bd0ff4241906a9bR1242
and confirms that it makes the problem go away, that would be really nice.
Given how much guesswork was involved, I'm not that confident that I nailed the root cause.
It looks like this exception sometimes fires into a void. From our production servers:
Originally posted by @lgrahl in https://github.com/aaugustin/websockets/issues/579#issuecomment-481214664