Closed esevan closed 5 years ago
This may be related to #39
Sorry for no idea about how to make the test for this.
I attach manual test result instead.
Test Env
Cluster: kubernetes
Gateway: EG on Kubernetes
KG_REQUEST_TIMEOUT = KG_CONNECT_TIMEOUT = 10
Test Scenario
from time import sleep
a = 0
for i in range(100):
a = i
print(a)
sleep(1)
Test Result Session is recovered in around 1minutes, and kernel session restarts writing the output at the output area after recovery (See the sceenshot output jumping from 34 to 95)
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:
@esevan @kevin-bates I am sorry i could not get back on this quickly. We have been trying to debug around various points of nb2kg's code as well as EG's end. Although we have not been able to figure out the reason why connection is lost after some time, we do realise certain points :-
This check still returns a false signal, due to which a write is finally triggered but since the stream is not available the final write operation does not work.
This is during the first shell message that you try to execute after the connection has been lost.
But since the exception is gracefully handled nothing happens and the client is still unaware of the loss.
@esevan I will try the changes you have done, but i am not sure that on_close() event here :- https://github.com/jupyter/nb2kg/blob/ddf6b7c3d119445f2bb4a03b8d8ea5a26a876bdc/nb2kg/handlers.py#L249 will be called at all.
Also, from the changes you have done (from what i can understand), let's say even if the on_close() event is being generated, it still won't reconnect the sockets but just print the log to do it, which might not be user-friendly.
Rather than doing this, I have another suggestion :-
What if while creating the connection between NB2KG and JEG, we also start a ping mechanism in the IOLoop, that pings every 1 min say. Now in scenario where we somehow lose a connection, the first ping will obviously do nothing but will help websocket realise that the stream is closed.
The second ping will raise an exception which will propagate back here :-
But rather than handling this exception, if you raise it this actually triggers a websocket close event back to jupyter lab.
If we look into jupyter lab's websocket close event here :-
Rather the immediately closing the websocket, it tries to do reconnect attempts with increasing timeouts which will immediately fix the connection if it can be fixed with the network state.
@kevin-bates Thanks for your review, kevin!
@IMAM9AIS Websocket connection closing event is also listened by read_message
. It returns None
if the connection between nb2kg and EG is closed. See more detail in https://www.tornadoweb.org/en/stable/websocket.html#client-side-support
The connection supports two styles of operation. In the coroutine style, the application typically calls
read_message
in a loop:conn = yield websocket_connect(url) while True: msg = yield conn.read_message() if msg is None: break # Do something with msg
In both styles, a message of None indicates that the connection has been closed.
One more thing for what it's worth, on_close
callback is server-side websocket callback, which is the callback for browser-nb2kg websocket connection.
@kevin-bates Thanks for your review, kevin!
@IMAM9AIS Websocket connection closing event is also listened by
read_message
. It returnsNone
if the connection between nb2kg and EG is closed. See more detail in https://www.tornadoweb.org/en/stable/websocket.html#client-side-supportThe connection supports two styles of operation. In the coroutine style, the application typically calls
read_message
in a loop:conn = yield websocket_connect(url) while True: msg = yield conn.read_message() if msg is None: break # Do something with msg
In both styles, a message of None indicates that the connection has been closed.
One more thing for what it's worth,
on_close
callback is server-side websocket callback, which is the callback for browser-nb2kg websocket connection.
@esevan Thanks a lot for clearing this. I will try out changes and will let you know.
When nb2kg lost the connection to KG, nb2kg didn't connect to KG again although the websocket connection from the client was still alive.
This change recovers the connection to KG to prevent above anomaly.
Signed-off-by: Eunsoo Park esevan.park@gmail.com