python-websockets / websockets

Library for building WebSocket servers and clients in Python
https://websockets.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
5.21k stars 515 forks source link

IncompleteReadError sometimes fires into a void #637

Closed aaugustin closed 5 years ago

aaugustin commented 5 years ago

It looks like this exception sometimes fires into a void. From our production servers:

Time:      2019-04-04 11:25:55
Level:     ERROR
Channel:   asyncio
File:      /usr/lib/python3.7/collections/__init__.py:414
Module:    collections
Function:  _replace

Message:

Future exception was never retrieved
future: <Future finished exception=ConnectionClosed('WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason')>
Traceback (most recent call last):
  File "/var/venv/saltyrtc-server/lib/python3.7/site-packages/websockets/protocol.py", line 674, in transfer_data
    message = yield from self.read_message()
  File "/var/venv/saltyrtc-server/lib/python3.7/site-packages/websockets/protocol.py", line 742, in read_message
    frame = yield from self.read_data_frame(max_size=self.max_size)
  File "/var/venv/saltyrtc-server/lib/python3.7/site-packages/websockets/protocol.py", line 815, in read_data_frame
    frame = yield from self.read_frame(max_size)
  File "/var/venv/saltyrtc-server/lib/python3.7/site-packages/websockets/protocol.py", line 884, in read_frame
    extensions=self.extensions,
  File "/var/venv/saltyrtc-server/lib/python3.7/site-packages/websockets/framing.py", line 99, in read
    data = yield from reader(2)
  File "/usr/lib/python3.7/asyncio/streams.py", line 677, in readexactly
    raise IncompleteReadError(incomplete, n)
asyncio.streams.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason

Originally posted by @lgrahl in https://github.com/aaugustin/websockets/issues/579#issuecomment-481214664

aaugustin commented 5 years ago

Like other situations discussed in #579, you lose the TCP connection.

transfer_data_task enters this exception handling block:

        except (ConnectionError, EOFError) as exc:
            # Reading data with self.reader.readexactly may raise:
            # - most subclasses of ConnectionError if the TCP connection
            #   breaks, is reset, or is aborted;
            # - IncompleteReadError, a subclass of EOFError, if fewer
            #   bytes are available than requested.
            self.transfer_data_exc = exc
            self.fail_connection(1006)

which records the IncompleteReadError in transfer_data_exc.

Then, somewhere, a future completes with a ConnectionClosed exception generated by this method:

    def connection_closed_exc(self) -> ConnectionClosed:
        exception: ConnectionClosed
        if self.close_code == 1000 or self.close_code == 1001:
            exception = ConnectionClosedOK(self.close_code, self.close_reason)
        else:
            exception = ConnectionClosedError(self.close_code, self.close_reason)
        # Chain to the exception that terminated data transfer, if any.
        exception.__cause__ = self.transfer_data_exc
        return exception

This gives a logical exception report (the poor wording of ConnectionClosed notwithstanding):

asyncio.streams.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason

This leaves the question of which Future completes with this exception and is never retrieved.

I suspect the keepalive ping:

    def abort_keepalive_pings(self) -> None:
        """
        Raise ConnectionClosed in pending keepalive pings.

        They'll never receive a pong once the connection is closed.

        """
        assert self.state is State.CLOSED
        exc = self.connection_closed_exc()

        for ping in self.pings.values():
            ping.set_exception(exc)  # <-- looks like our culprit

Enabling the debug mode of asyncio would tell us if this is indeed where the Future gets created, but I understand you can't do that in production due to the overhead.

If this is indeed the issue, it should be easy to reproduce by:

This leaves the question of how to deal with pending pings when the connection drops...

aaugustin commented 5 years ago

@lgrahl If you're able to patch your production with this line:

https://github.com/aaugustin/websockets/pull/638/files#diff-5e2d99fc26f0a3387bd0ff4241906a9bR1242

and confirms that it makes the problem go away, that would be really nice.

Given how much guesswork was involved, I'm not that confident that I nailed the root cause.