nats-io / nats.py

Python3 client for NATS
https://nats-io.github.io/nats.py/
Apache License 2.0
902 stars 193 forks source link

Client got network error after a while #598

Open kitscribe opened 3 months ago

kitscribe commented 3 months ago

Observed behavior

Network errors occur after some time after the application starts

2024-08-09 13:23:45.032 TimeoutError
2024-08-09 13:23:45.032     raise exceptions.TimeoutError() from exc
2024-08-09 13:23:45.032   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 502, in wait_for
2024-08-09 13:23:45.032            ^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:45.032     r, w = await asyncio.wait_for(
2024-08-09 13:23:45.032   File "/usr/local/lib/python3.11/site-packages/nats/aio/transport.py", line 121, in connect
2024-08-09 13:23:45.032     await self._transport.connect(
2024-08-09 13:23:45.032   File "/usr/local/lib/python3.11/site-packages/nats/aio/client.py", line 1330, in _select_next_server
2024-08-09 13:23:45.032 Traceback (most recent call last):
2024-08-09 13:23:45.032 
2024-08-09 13:23:45.032 The above exception was the direct cause of the following exception:
2024-08-09 13:23:45.032 
2024-08-09 13:23:45.032 asyncio.exceptions.CancelledError
2024-08-09 13:23:45.032            ^^^^^^^^^
2024-08-09 13:23:45.032     return await fut
2024-08-09 13:23:45.032   File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 638, in sock_connect
2024-08-09 13:23:45.032     await self.sock_connect(sock, address)
2024-08-09 13:23:45.032   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 974, in _connect_sock
2024-08-09 13:23:45.032            ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:45.032     sock = await self._connect_sock(
2024-08-09 13:23:45.032   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1070, in create_connection
2024-08-09 13:23:45.032                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:45.032     transport, _ = await loop.create_connection(
2024-08-09 13:23:45.032   File "/usr/local/lib/python3.11/asyncio/streams.py", line 48, in open_connection
2024-08-09 13:23:45.032            ^^^^^^^^^^^^
2024-08-09 13:23:45.032     return fut.result()
2024-08-09 13:23:45.032   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 500, in wait_for
2024-08-09 13:23:45.032 Traceback (most recent call last):
2024-08-09 13:23:45.032 nats: encountered error
2024-08-09 13:23:41.027 TimeoutError
2024-08-09 13:23:41.027     raise exceptions.TimeoutError() from exc
2024-08-09 13:23:41.027   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 502, in wait_for
2024-08-09 13:23:41.027            ^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:41.027     r, w = await asyncio.wait_for(
2024-08-09 13:23:41.027   File "/usr/local/lib/python3.11/site-packages/nats/aio/transport.py", line 121, in connect
2024-08-09 13:23:41.027     await self._transport.connect(
2024-08-09 13:23:41.027   File "/usr/local/lib/python3.11/site-packages/nats/aio/client.py", line 1330, in _select_next_server
2024-08-09 13:23:41.027 Traceback (most recent call last):
2024-08-09 13:23:41.027 
2024-08-09 13:23:41.027 The above exception was the direct cause of the following exception:
2024-08-09 13:23:41.027 
2024-08-09 13:23:41.027 asyncio.exceptions.CancelledError
2024-08-09 13:23:41.027            ^^^^^^^^^
2024-08-09 13:23:41.027     return await fut
2024-08-09 13:23:41.027   File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 638, in sock_connect
2024-08-09 13:23:41.027     await self.sock_connect(sock, address)
2024-08-09 13:23:41.027   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 974, in _connect_sock
2024-08-09 13:23:41.027            ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:41.027     sock = await self._connect_sock(
2024-08-09 13:23:41.027   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1070, in create_connection
2024-08-09 13:23:41.027                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:41.027     transport, _ = await loop.create_connection(
2024-08-09 13:23:41.027   File "/usr/local/lib/python3.11/asyncio/streams.py", line 48, in open_connection
2024-08-09 13:23:41.027            ^^^^^^^^^^^^
2024-08-09 13:23:41.027     return fut.result()
2024-08-09 13:23:41.027   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 500, in wait_for
2024-08-09 13:23:41.027 Traceback (most recent call last):
2024-08-09 13:23:41.027 nats: encountered error
2024-08-09 13:23:37.019 TimeoutError
2024-08-09 13:23:37.019     raise exceptions.TimeoutError() from exc
2024-08-09 13:23:37.019   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 502, in wait_for
2024-08-09 13:23:37.019            ^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:37.019     r, w = await asyncio.wait_for(
2024-08-09 13:23:37.019   File "/usr/local/lib/python3.11/site-packages/nats/aio/transport.py", line 121, in connect
2024-08-09 13:23:37.019     await self._transport.connect(
2024-08-09 13:23:37.019   File "/usr/local/lib/python3.11/site-packages/nats/aio/client.py", line 1330, in _select_next_server
2024-08-09 13:23:37.019 Traceback (most recent call last):
2024-08-09 13:23:37.019 
2024-08-09 13:23:37.019 The above exception was the direct cause of the following exception:
2024-08-09 13:23:37.019 
2024-08-09 13:23:37.019 asyncio.exceptions.CancelledError
2024-08-09 13:23:37.019            ^^^^^^^^^
2024-08-09 13:23:37.019     return await fut
2024-08-09 13:23:37.019   File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 638, in sock_connect
2024-08-09 13:23:37.019     await self.sock_connect(sock, address)
2024-08-09 13:23:37.019   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 974, in _connect_sock
2024-08-09 13:23:37.019            ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:37.019     sock = await self._connect_sock(
2024-08-09 13:23:37.019   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1070, in create_connection
2024-08-09 13:23:37.019                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:37.019     transport, _ = await loop.create_connection(
2024-08-09 13:23:37.019   File "/usr/local/lib/python3.11/asyncio/streams.py", line 48, in open_connection
2024-08-09 13:23:37.019            ^^^^^^^^^^^^
2024-08-09 13:23:37.019     return fut.result()
2024-08-09 13:23:37.019   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 500, in wait_for
2024-08-09 13:23:37.019 Traceback (most recent call last):
2024-08-09 13:23:37.019 nats: encountered error
2024-08-09 13:23:33.013 TimeoutError
2024-08-09 13:23:33.013     raise exceptions.TimeoutError() from exc
2024-08-09 13:23:33.013   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 502, in wait_for
2024-08-09 13:23:33.013            ^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:33.013     r, w = await asyncio.wait_for(
2024-08-09 13:23:33.013   File "/usr/local/lib/python3.11/site-packages/nats/aio/transport.py", line 121, in connect
2024-08-09 13:23:33.013     await self._transport.connect(
2024-08-09 13:23:33.013   File "/usr/local/lib/python3.11/site-packages/nats/aio/client.py", line 1330, in _select_next_server
2024-08-09 13:23:33.013 Traceback (most recent call last):
2024-08-09 13:23:33.013 
2024-08-09 13:23:33.013 The above exception was the direct cause of the following exception:
2024-08-09 13:23:33.013 
2024-08-09 13:23:33.013 asyncio.exceptions.CancelledError
2024-08-09 13:23:33.013            ^^^^^^^^^
2024-08-09 13:23:33.013     return await fut
2024-08-09 13:23:33.013   File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 638, in sock_connect
2024-08-09 13:23:33.013     await self.sock_connect(sock, address)
2024-08-09 13:23:33.013   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 974, in _connect_sock
2024-08-09 13:23:33.013            ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:33.013     sock = await self._connect_sock(
2024-08-09 13:23:33.012   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1070, in create_connection
2024-08-09 13:23:33.012                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:33.012     transport, _ = await loop.create_connection(
2024-08-09 13:23:33.012   File "/usr/local/lib/python3.11/asyncio/streams.py", line 48, in open_connection
2024-08-09 13:23:33.012            ^^^^^^^^^^^^
2024-08-09 13:23:33.012     return fut.result()
2024-08-09 13:23:33.012   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 500, in wait_for
2024-08-09 13:23:33.012 Traceback (most recent call last):
2024-08-09 13:23:33.012 nats: encountered error
2024-08-09 13:23:29.006 TimeoutError
2024-08-09 13:23:29.006     raise exceptions.TimeoutError() from exc
2024-08-09 13:23:29.006   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 502, in wait_for
2024-08-09 13:23:29.006            ^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:29.006     r, w = await asyncio.wait_for(
2024-08-09 13:23:29.006   File "/usr/local/lib/python3.11/site-packages/nats/aio/transport.py", line 121, in connect
2024-08-09 13:23:29.006     await self._transport.connect(
2024-08-09 13:23:29.006   File "/usr/local/lib/python3.11/site-packages/nats/aio/client.py", line 1330, in _select_next_server
2024-08-09 13:23:29.006 Traceback (most recent call last):
2024-08-09 13:23:29.006 
2024-08-09 13:23:29.006 The above exception was the direct cause of the following exception:
2024-08-09 13:23:29.006 
2024-08-09 13:23:29.006 asyncio.exceptions.CancelledError
2024-08-09 13:23:29.006            ^^^^^^^^^
2024-08-09 13:23:29.006     return await fut
2024-08-09 13:23:29.006   File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 638, in sock_connect
2024-08-09 13:23:29.006     await self.sock_connect(sock, address)
2024-08-09 13:23:29.006   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 974, in _connect_sock
2024-08-09 13:23:29.006            ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:29.006     sock = await self._connect_sock(
2024-08-09 13:23:29.006   File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1070, in create_connection
2024-08-09 13:23:29.006                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:29.006     transport, _ = await loop.create_connection(
2024-08-09 13:23:29.006   File "/usr/local/lib/python3.11/asyncio/streams.py", line 48, in open_connection
2024-08-09 13:23:29.006            ^^^^^^^^^^^^
2024-08-09 13:23:29.006     return fut.result()
2024-08-09 13:23:29.006   File "/usr/local/lib/python3.11/asyncio/tasks.py", line 500, in wait_for
2024-08-09 13:23:29.006 Traceback (most recent call last):
2024-08-09 13:23:29.006 nats: encountered error
2024-08-09 13:23:27.004 TimeoutError: SSL shutdown timed out
2024-08-09 13:23:27.004     await self._protocol._get_close_waiter(self)
2024-08-09 13:23:27.004   File "/usr/local/lib/python3.11/asyncio/streams.py", line 364, in wait_closed
2024-08-09 13:23:27.004            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:23:27.004     return await self._io_writer.wait_closed()
2024-08-09 13:23:27.004   File "/usr/local/lib/python3.11/site-packages/nats/aio/transport.py", line 180, in wait_closed
2024-08-09 13:23:27.004     await self._transport.wait_closed()
2024-08-09 13:23:27.004   File "/usr/local/lib/python3.11/site-packages/nats/aio/client.py", line 1419, in _attempt_reconnect
2024-08-09 13:23:27.004 Traceback (most recent call last):
2024-08-09 13:23:27.004 nats: encountered error
2024-08-09 13:21:36.802 ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2706)
2024-08-09 13:21:36.802            ^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:21:36.802     return self._sslobj.shutdown()
2024-08-09 13:21:36.802   File "/usr/local/lib/python3.11/ssl.py", line 983, in unwrap
2024-08-09 13:21:36.802     self._sslobj.unwrap()
2024-08-09 13:21:36.802   File "/usr/local/lib/python3.11/asyncio/sslproto.py", line 648, in _do_shutdown
2024-08-09 13:21:36.802     await self._protocol._get_close_waiter(self)
2024-08-09 13:21:36.802   File "/usr/local/lib/python3.11/asyncio/streams.py", line 364, in wait_closed
2024-08-09 13:21:36.802            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:21:36.802     return await self._io_writer.wait_closed()
2024-08-09 13:21:36.802   File "/usr/local/lib/python3.11/site-packages/nats/aio/transport.py", line 180, in wait_closed
2024-08-09 13:21:36.802     await self._transport.wait_closed()
2024-08-09 13:21:36.802   File "/usr/local/lib/python3.11/site-packages/nats/aio/client.py", line 1419, in _attempt_reconnect
2024-08-09 13:21:36.802 Traceback (most recent call last):
2024-08-09 13:21:36.802 nats: encountered error
2024-08-09 13:20:25.633 ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2706)
2024-08-09 13:20:25.633            ^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:20:25.633     return self._sslobj.shutdown()
2024-08-09 13:20:25.633   File "/usr/local/lib/python3.11/ssl.py", line 983, in unwrap
2024-08-09 13:20:25.633     self._sslobj.unwrap()
2024-08-09 13:20:25.633   File "/usr/local/lib/python3.11/asyncio/sslproto.py", line 648, in _do_shutdown
2024-08-09 13:20:25.633     await self._protocol._get_close_waiter(self)
2024-08-09 13:20:25.633   File "/usr/local/lib/python3.11/asyncio/streams.py", line 364, in wait_closed
2024-08-09 13:20:25.633            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-09 13:20:25.633     return await self._io_writer.wait_closed()
2024-08-09 13:20:25.633   File "/usr/local/lib/python3.11/site-packages/nats/aio/transport.py", line 180, in wait_closed
2024-08-09 13:20:25.633     await self._transport.wait_closed()
2024-08-09 13:20:25.633   File "/usr/local/lib/python3.11/site-packages/nats/aio/client.py", line 1419, in _attempt_reconnect
2024-08-09 13:20:25.633 Traceback (most recent call last):
2024-08-09 13:20:25.633 nats: encountered error

Expected behavior

No errors or crashes

Server and client version

client: ^= 2.7.2 (2.7.2 or higher)
server: == 2.6.1

Host environment

There are a consumer on the partner side I am connected to. After a while the connection is broken and it is not possible to reconnect. But if I restart the container with the application, it became all good for some time before the connection is broken again

Steps to reproduce

Here is how the connection is configured and used. The connection goes over TLS

    @staticmethod
    def _is_secured() -> bool:
        if environment.NATS_CONNECTION_STRING.startswith('tls://'):
            return True
        return False

    @staticmethod
    def _generate_ssl_context():
        ssl_ctx = ssl.create_default_context(purpose=ssl.Purpose.SERVER_AUTH)
        ssl_ctx.load_cert_chain(
            certfile=environment.NATS_TLS_CERTIFICATE_PATH,
            keyfile=environment.NATS_TLS_KEYFILE_PATH
        )
        return ssl_ctx

    async def connect(self) -> None:
        try:
            self.logger.info("Trying to connect to the NATS...")
            if self._secured:
                await self._nc.connect(
                    servers=environment.NATS_CONNECTION_STRING,
                    tls=self._generate_ssl_context(),
                    allow_reconnect=True,
                    max_reconnect_attempts=60,
                    ping_interval=20
                )
            else:
                await self._nc.connect(servers=settings.NATS_CONNECTION_STRING)
            self.logger.info("Connecting to the NATS succeed")

        except Error as e:
            self.logger.error(f"An error occurred while connecting to the NATS: {e}")

What could be the cause of this problem?

Thank you

theobouwman commented 1 month ago

I am facing exactly the same issue. After some time connection lost, and does not reconnect. Fix is to restart instance.

https://github.com/airtai/faststream/issues/1581#issuecomment-2408021302

Have you found a solution for it?