Open Apakottur opened 3 years ago
I've seen this in the wild too ... I think the open()
and on_close()
can be triggered, roughly, in the same ioloop iteration, and the order they run during the next ioloop iteration is semi-random. (I've worked around it with an additional attribute self.is_closed
, initialized in prepare()
to None
. If open()
sees it as True
, abort, otherwise set False
. If on_close()
sees it as None
abort, else set True
.)
This looks like an oversight - on_close
should only be called if open
has been called (on_connection_closed
may be called whether the websocket handshake has completed or not, but on_close
should be matched with open
). It looks like there's simply no check for this case; there's no need to have non-deterministic execution order (open
is actually called synchronously from _accept_connection
)
I made an attempt to fix this in #2966
TL;DR It's possible under the current implementation for
on_close
to be called beforeopen
for short-lived WebSocket connections. This is due to theon_close
callback being loaded before the connection is fully established.I am working on a project where we have a Python tornado-based WebSocket server that communicates with JavaScript WebSocket clients.
As part of our server class, we have overriden the functions
open
andon_close
ofwebsocket.WebSocketHandler
. Inopen
we initialize some client-specific data and inon_close
we clean it up.Recently we have been dealing with a bug that was caused by
on_close
being called beforeopen
. This seems to happen if the client connection is unstable, and several attempts are made by the client before a WebSocket connection is established.When this happens, our code raises an error, as it is not able to cleanup a non initialized client.
We believe that the problem lies with how tornado accepts a new connection. It seems that the
on_close
callback is being loaded before the connection is fully established (andopen
is called), which means that there's a race condition in the open procedure for short lived connections.We were able to get the error every time by adding
time.sleep(1)
inself.accept_connection
:Has anyone else had this problem? Perhaps we are wrong to assume that
on_close
will never be called beforeopen
?