python-websockets / websockets

Library for building WebSocket servers and clients in Python
https://websockets.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
5.23k stars 519 forks source link

EOFError from connection handshake failure between websockets==14.0 client and websockets==8.1 server #1548

Closed DrPyser closed 5 days ago

DrPyser commented 1 week ago

Some of our automated tests using websockets started failing since the upgrade of test client code to websockets 14.0.

The code in question is attempting connections to a server that might not be ready to handle connections yet, and previously we expected exceptions ConnectionClosed or InvalidMessage from websockets.connect, but now we're getting EOFError (response parsing errors) instead.

ref. (the relevant test code) https://github.com/wazo-platform/wazo-websocketd/blob/de1531769a81bb90a99395371ad8b1da6a745dd5/integration_tests/suite/helpers/wait_strategy.py#L38-L45 https://github.com/wazo-platform/wazo-websocketd/blob/de1531769a81bb90a99395371ad8b1da6a745dd5/integration_tests/suite/helpers/websocketd.py#L41

(the failing test run) https://jenkins.wazo.community/job/integration-tests-nolock/742//console

From the failing test logs:

21:59:28  suite/helpers/wait_strategy.py:31: in wait
21:59:28      loop.run_until_complete(fut)
21:59:28  /usr/lib/python3.9/asyncio/base_events.py:647: in run_until_complete
21:59:28      return future.result()
21:59:28  suite/helpers/wait_strategy.py:40: in await_for_connection
21:59:28      await client.connect_and_wait_for_init(token)
21:59:28  suite/helpers/websocketd.py:44: in connect_and_wait_for_init
21:59:28      await self.connect(token_id, version)
21:59:28  suite/helpers/websocketd.py:41: in connect
21:59:28      self._websocket = await websockets.connect(url)
21:59:28  ../../wazo-websocketd-integration-tests-venv/lib/python3.9/site-packages/websockets/asyncio/client.py:443: in __await_impl__
21:59:28      await self.connection.handshake(*self.handshake_args)
21:59:28  ../../wazo-websocketd-integration-tests-venv/lib/python3.9/site-packages/websockets/asyncio/client.py:104: in handshake
21:59:28      raise self.protocol.handshake_exc
21:59:28  ../../wazo-websocketd-integration-tests-venv/lib/python3.9/site-packages/websockets/client.py:315: in parse
21:59:28      response = yield from Response.parse(
21:59:28  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
21:59:28  
21:59:28  cls = <class 'websockets.http11.Response'>
21:59:28  read_line = <bound method StreamReader.read_line of <websockets.streams.StreamReader object at 0x7f1220492c10>>
21:59:28  read_exact = <bound method StreamReader.read_exact of <websockets.streams.StreamReader object at 0x7f1220492c10>>
21:59:28  read_to_eof = <bound method StreamReader.read_to_eof of <websockets.streams.StreamReader object at 0x7f1220492c10>>
21:59:28  
21:59:28      @classmethod
21:59:28      def parse(
21:59:28          cls,
21:59:28          read_line: Callable[[int], Generator[None, None, bytes]],
21:59:28          read_exact: Callable[[int], Generator[None, None, bytes]],
21:59:28          read_to_eof: Callable[[int], Generator[None, None, bytes]],
21:59:28      ) -> Generator[None, None, Response]:
21:59:28          """
21:59:28          Parse a WebSocket handshake response.
21:59:28      
21:59:28          This is a generator-based coroutine.
21:59:28      
21:59:28          The reason phrase and headers are expected to contain only ASCII
21:59:28          characters. Other characters are represented with surrogate escapes.
21:59:28      
21:59:28          Args:
21:59:28              read_line: Generator-based coroutine that reads a LF-terminated
21:59:28                  line or raises an exception if there isn't enough data.
21:59:28              read_exact: Generator-based coroutine that reads the requested
21:59:28                  bytes or raises an exception if there isn't enough data.
21:59:28              read_to_eof: Generator-based coroutine that reads until the end
21:59:28                  of the stream.
21:59:28      
21:59:28          Raises:
21:59:28              EOFError: If the connection is closed without a full HTTP response.
21:59:28              SecurityError: If the response exceeds a security limit.
21:59:28              LookupError: If the response isn't well formatted.
21:59:28              ValueError: If the response isn't well formatted.
21:59:28      
21:59:28          """
21:59:28          # https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.2
21:59:28      
21:59:28          try:
21:59:28              status_line = yield from parse_line(read_line)
21:59:28          except EOFError as exc:
21:59:28  >           raise EOFError("connection closed while reading HTTP status line") from exc
21:59:28  E           EOFError: connection closed while reading HTTP status line
21:59:28  
aaugustin commented 1 week ago

Short term mitigation: check out the changelog for an explanation of what's happening and how to get the previous behavior back; the legacy implementation will be maintained for five years.

That being said, the new implementation of connect is documented to raise InvalidHandshake (which InvalidMessage inherits) but not EOFError.

Options include:

aaugustin commented 1 week ago

ClientConnection.handshake() raises raise self.protocol.handshake_exc. As a consequence, connect() can raise any exception saved in handshake_exc.

There are two places where handshake_exc is assigned. One of them is in a except InvalidHandshake as exc: block so it's fine. The other one is in a except Exception as exc: and it's the problem here. It catches exception raised by yield from Response.parse(...) and there's many of them (EOFError, SecurityError, LookupError, ValueError). It makes sense to wrap them in an InvalidHandshake exception.

aaugustin commented 1 week ago

In this scenario, the legacy implementation did:

        try:
            status_code, reason, headers = await read_response(self.reader)
        except Exception as exc:
            raise InvalidMessage("did not receive a valid HTTP response") from exc

The new implementation doesn't use InvalidMessage anywhere; that exception was deprecated together with the legacy implementation. So we have a choice between:

aaugustin commented 1 week ago

I'm leaning towards bringing back InvalidMessage.

aaugustin commented 1 week ago

So -

DrPyser commented 1 week ago

@aaugustin Thank you for your quick response and detailed breakdown.

DrPyser commented 1 week ago

I think in our case we're fine with catching EOFError in the relevant code, instead of binding ourselves to the legacy implementation. We'll be on the lookout if you revert back the exception handling.

aaugustin commented 1 week ago

Yes that works too!

aaugustin commented 1 week ago

I started a PR. It requires adjusting tests and adding a changelog.