tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.74k stars 5.5k forks source link

on_close called before open on short-lived WebSocket connections #2958

Open Apakottur opened 3 years ago

Apakottur commented 3 years ago

TL;DR It's possible under the current implementation for on_close to be called before open for short-lived WebSocket connections. This is due to the on_close callback being loaded before the connection is fully established.


I am working on a project where we have a Python tornado-based WebSocket server that communicates with JavaScript WebSocket clients.

As part of our server class, we have overriden the functions open and on_close of websocket.WebSocketHandler. In open we initialize some client-specific data and in on_close we clean it up.

Recently we have been dealing with a bug that was caused by on_close being called before open. This seems to happen if the client connection is unstable, and several attempts are made by the client before a WebSocket connection is established.

When this happens, our code raises an error, as it is not able to cleanup a non initialized client.

We believe that the problem lies with how tornado accepts a new connection. It seems that the on_close callback is being loaded before the connection is fully established (and open is called), which means that there's a race condition in the open procedure for short lived connections.

We were able to get the error every time by adding time.sleep(1) in self.accept_connection:

async def accept_connection(self, handler: WebSocketHandler) -> None:
    try:
        self._handle_websocket_headers(handler)
    except ValueError:
        handler.set_status(400)
        log_msg = "Missing/Invalid WebSocket headers"
        handler.finish(log_msg)
        gen_log.debug(log_msg)
        return

    try:
        time.sleep(1)
        await self._accept_connection(handler)
    except asyncio.CancelledError:
        self._abort()
        return
    except ValueError:
        gen_log.debug("Malformed WebSocket request received", exc_info=True)
        self._abort()
        return

Has anyone else had this problem? Perhaps we are wrong to assume that on_close will never be called before open?

ploxiln commented 3 years ago

I've seen this in the wild too ... I think the open() and on_close() can be triggered, roughly, in the same ioloop iteration, and the order they run during the next ioloop iteration is semi-random. (I've worked around it with an additional attribute self.is_closed, initialized in prepare() to None. If open() sees it as True, abort, otherwise set False. If on_close() sees it as None abort, else set True.)

bdarnell commented 3 years ago

This looks like an oversight - on_close should only be called if open has been called (on_connection_closed may be called whether the websocket handshake has completed or not, but on_close should be matched with open). It looks like there's simply no check for this case; there's no need to have non-deterministic execution order (open is actually called synchronously from _accept_connection)

ploxiln commented 3 years ago

I made an attempt to fix this in #2966