Closed belm0 closed 5 years ago
From the trace, it's clearly related to a newly incoming connection. There is probably server code using _write_pending
but not handling ConnectionClosed. @mehaase would you take a look?
This is a trio-websocket reliability issue for my use case, happening enough to be a problem.
(Note that async stack traces can mislead by incorrectly attributing code location to the last line of an async block. E.g. the exception in _handle_connection
could well be anywhere in the nursery scope, such as the nursery.start_soon(connection._reader_task)
call, yet Python attributes it to the last line, 1116, await connection.aclose()
.)
I'll look at it this week.
I think I've managed to reproduce this. Here is an implementation of a faulty server:
from random import random
import trio
from trio_websocket import serve_websocket, ConnectionClosed
async def main():
''' Main entry point. '''
print('Starting websocket server…')
await serve_websocket(handler, 'localhost', 8000, ssl_context=None)
async def handler(request):
''' Reverse incoming websocket messages and send them back. '''
try:
ws = await request.accept()
await ws.send_message('foo')
await trio.sleep(random()/1000)
await trio.aclose_forcefully(ws._stream)
except ConnectionClosed:
pass
if __name__ == '__main__':
trio.run(main)
The client is this:
import trio
from trio_websocket import connect_websocket, ConnectionClosed
async def main():
async with trio.open_nursery() as nursery:
for n in range(1000):
print('connection {}'.format(n))
try:
connection = await connect_websocket(nursery, 'localhost', 8000,
'/', use_ssl=False)
await connection.get_message()
await connection.aclose()
except ConnectionClosed:
print('connection closed')
if __name__ == '__main__':
trio.run(main)
The client connects to the server repeatedly. Within a small number of iterations (often less than 10), the server crashes with the following exception.
$ python bad-server.py
Starting websocket server…
Traceback (most recent call last):
File "bad-server.py", line 24, in <module>
trio.run(main)
File "/home/mhaase/code/trio-websocket/venv/lib/python3.6/site-packages/trio/_core/_run.py", line 1337, in run
raise runner.main_task_outcome.error
File "bad-server.py", line 10, in main
await serve_websocket(handler, 'localhost', 8000, ssl_context=None)
File "/home/mhaase/code/trio-websocket/trio_websocket/_impl.py", line 327, in serve_websocket
await server.run(task_status=task_status)
File "/home/mhaase/code/trio-websocket/trio_websocket/_impl.py", line 1091, in run
await trio.sleep_forever()
File "/home/mhaase/code/trio-websocket/venv/lib/python3.6/site-packages/trio/_core/_run.py", line 397, in __aexit__
raise combined_error_from_nursery
File "/home/mhaase/code/trio-websocket/venv/lib/python3.6/site-packages/trio/_highlevel_serve_listeners.py", line 129, in serve_listeners
task_status.started(listeners)
File "/home/mhaase/code/trio-websocket/venv/lib/python3.6/site-packages/trio/_core/_run.py", line 397, in __aexit__
raise combined_error_from_nursery
File "/home/mhaase/code/trio-websocket/venv/lib/python3.6/site-packages/trio/_highlevel_serve_listeners.py", line 27, in _run_handler
await handler(stream)
File "/home/mhaase/code/trio-websocket/trio_websocket/_impl.py", line 1119, in _handle_connection
await connection.aclose()
File "/home/mhaase/code/trio-websocket/venv/lib/python3.6/site-packages/trio/_core/_run.py", line 397, in __aexit__
raise combined_error_from_nursery
File "/home/mhaase/code/trio-websocket/trio_websocket/_impl.py", line 924, in _reader_task
await handler(event)
File "/home/mhaase/code/trio-websocket/trio_websocket/_impl.py", line 811, in _handle_connection_closed_event
await self._write_pending()
File "/home/mhaase/code/trio-websocket/trio_websocket/_impl.py", line 962, in _write_pending
raise ConnectionClosed(self._close_reason) from None
trio_websocket._impl.ConnectionClosed: <CloseReason code=1006 name=ABNORMAL_CLOSURE reason=None>
The root cause is that the reader task is getting a connection closed event, and it wants to call _write_pending()
, but before it can do that the stream is already broken. The reader task does not catch ConnectionClosed
, which causes the exception to propagate out and cancel the entire nursery.
Keeping mind that I want to overhaul the close logic in #90 at a later date, I'll try to keep the fix for this bug as small as possible. The simplest fix is that the reader task should catch ConnectionClosed
when it calls _write_pending()
directly or when it calls it indirectly through a handler function. While chasing this down, I also noticed that WebSocketConnection.aclose()
can also leak a ConnectionClosed
exception, so I'll fix that, too.
With these changes, the scripts above run without error. I'll work on a test case tomorrow. The bug is triggered by a race (it only occurs if the server has a random delay added), so building a test case should be a matter of carefully sequencing events.
Thank you! That does seem to match, and my app has a non-trivial amount of stuff on the trio scheduler which could cause the delay.
Hit a case where serve_websocket is leaking a ConnectionClosed exception. My handler doesn't appear to be involved in this trace. What am I missing?