Closed gabordozsa closed 1 month ago
I have run system_tests_tcp_conns_terminate.py more times and I managed to hit the write() exception again in the tearDown() when the wait() is called for the SSL enabled TCPEchoServers. So this PR helps to avoid the ResourceWarning messages but the write() exception still happens sporadically.
2024-07-26 22:14:57.731920 TerminateTcpConnectionsTest ECHO_SERVER_address_default_ssl: Connection to 127.0.0.1:43546 lost, exception=None
2024-07-26 22:14:57.732032 TerminateTcpConnectionsTest ECHO_SERVER_address_default_ssl: Connection to 127.0.0.1:43568 lost, exception=None
2024-07-26 22:14:57.732097 TerminateTcpConnectionsTest ECHO_SERVER_address_terminate Server shutdown completed
2024-07-26 22:14:57.732143 TerminateTcpConnectionsTest ECHO_SERVER_address_default_ssl Server is shutting down
2024-07-26 22:14:57.732535 TerminateTcpConnectionsTest ECHO_SERVER_address_default_ssl Server shutdown completed
2024-07-26 22:14:57.732596 TerminateTcpConnectionsTest ECHO_SERVER_address_termnate_ssl Server is shutting down
2024-07-26 22:14:57.732928 TerminateTcpConnectionsTest ECHO_SERVER_address_termnate_ssl Server shutdown completed
----------------------------------------------------------------------
Ran 4 tests in 33.230s
OK
ERROR:asyncio:Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0xffffa76704f0>
transport: <_SelectorSocketTransport closing fd=49>
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
self._transport.write(chunk)
File "/usr/lib/python3.10/asyncio/selector_events.py", line 930, in write
self._fatal_error(exc, 'Fatal write error on socket transport')
File "/usr/lib/python3.10/asyncio/selector_events.py", line 725, in _fatal_error
self._force_close(exc)
File "/usr/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
self._loop.call_soon(self._call_connection_lost, exc)
File "/usr/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
self._check_closed()
File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
TL;DR - we should land this fix since there's really no better alternative.
See https://github.com/python/cpython/issues/113538
The problem is that there doesn't appear to be a way to cleanly shutdown the asyncio.Server. And by "cleanly" I specifically mean closing currently open client sockets and reclaiming resources. See the above cpython issue, I'm pretty sure that's the root of that problem.
The other thing to consider is that maybe the way we terminate the server is wrong:
What we're trying to implement is a way to stop the Server from another thread via calling the wait() method. I'm not sure the _cancel_server() code is the right way of doing this. I seem to recall hacking at this for awhile since I could not find a clear explaination of how this should be done on the Python docs. If there's anyone out there that actually understands the nuances of running an asyncio.Server asynchronously from the test I'm open to suggestions!
@gabordozsa Do you have commit rights to the skupper-router github repo ? If not, I will ask @ted-ross or Andy Smith to give you commit rights.
@gabordozsa Do you have commit rights to the skupper-router github repo ? If not, I will ask @ted-ross or Andy Smith to give you commit rights.
@ganeshmurthy No, I do not have write access yet.
@gabordozsa Do you have commit rights to the skupper-router github repo ? If not, I will ask @ted-ross or Andy Smith to give you commit rights.
@ganeshmurthy No, I do not have write access yet.
I have emailed @ted-ross and Andy Smith to give you commit rights for skupper-router github repo. I will keep you posted.
@gabordozsa I will commit this PR to main.
This is to avoid potential ReseourceWarnings and write() exceptions during tearDown() of SSL tests. I hit this with system_tests_tcp_conns_terminate.py but I think it can happen in any other test which uses the echo client with SSL. Example warning :