Closed JMurph2015 closed 7 years ago
The gateway is using jupyter notebook as a package to setup all the handlers and for zmq communication with kernels. I'm sure effort has been put into making the notebook server responsive over the years for a single user, but I doubt it's been the top priority of the project.
If you're looking to improve things further, before starting from scratch, perhaps you could look at finding easy performance wins in the https://github.com/jupyter/notebook codebase?
The websocket connection is to the kernel gateway. The kernel gateway then uses several ZMQ channels to talk to the kernel. Messages received from the kernel are relayed back through the websocket connection.
I recommend that you spend some time on measuring how exactly those 8 ms accumulate. On the way to the kernel, from the kernel, inside the kernel? Is too much processing being done, or are messages not picked up promptly because the receiving thread is busy with something else? Maybe that leads to some ideas for tuning or code optimization.
If you don't measure first, you can't be sure that the time isn't lost inside the kernel you're interfacing with. Then again, interfacing directly might be a good way to measure the delay added by the kernel gateway.
Hi! So the code I'm hitting the kernel gateway with is fully async Tornado dispatches both ways, so it's probably some of the faster webserver code you'll see this side of like Node or a compiled language.
I can't really get into many details about what its supposed to do other than interface with Jupyter kernels quickly (all code at my workplace is considered proprietary until we go through a review process for release).
I'm reasonably familiar with the Jupyter messaging protocol since I've essentially implemented a client for it, but multiplexed over Websockets.
As for the test code I'm using to see this latency, it's literally 'x = 10+25**2' being feed into a python kernel, and I measured the average native runtime of that as ~0.001 ms or less. (margin of error on the system timer over 1000 runs), so the 8ms number is really almost entirely overhead (some of which is understandably unavoidable, like the 0.2ms that Websockets imposes and the ~0.3ms that my end point code adds, some <1ms of ping between docker containers).
I probably will just create a client to the zmq sockets, but as a piece of guidance, if the kernel gateway is going to be useful for non-notebook frontends of non-trivial scale, it'll need to be faster. (once again sorry for being vague, but good ol' corporate policy)
Also I think I may have found your issue. The Jupyter core codebase doesn't appear to use any async functionality (on the ZMQ side), which if that is the case, then it would easily explain the sluggishness I'm seeing.
Edit: I found jupyter_client's ioloop client version. It would be better if it used zmq's asyncio support, but I'll see what happens if I interface with that.
Thanks for digging into this!
Update 2: I looked harder, and it would appear that you guys are pulling in the correct (current best) ioloop manager, but it is still using some (relatively) stale code that isn't really the fastest async zmq or python itself can do. I would recommend writing another class of kernel manager that depends on python 3.6's async functionality and propagating that through your class hierarchy. It would be kinda un-kosher for me to do that right now, but I'll see if I can work that out at some future date :/ . I should probably drop that suggestion on the jupyter_client or notebook repo since it's not really your problem if it's upstream.
Yes, reporting that suggestion upstream would be kind of you.
I'll let you guys know if I really find a smoking gun, I would say I whiffed smoke here and there, but haven't really found which exact operation is taking so long, so there's still some work to be done on that front.
So I did some timing and some more careful poking around. It would seem that kernel_gateway is not the worst offender here. Even with a new async client, I was only able to get an IPython kernel's latency to around 5-6ms end-to-end. Of that, it appears that the vast majority of it is IPython side and not jupyter_client side (though about 1ms - 1.5 ms can be reasonably attributed to jupyter_client). That makes kernel_gateway's trade-off of a couple milliseconds for a REST API and websockets look pretty reasonable.
Thanks for the help!
PS a newer async client would still be useful, but that will probably happen upstream, so I'll make a new issue when one becomes available.
Thanks for looking into this and reporting your findings.
Hi! I'm working on an app that requires decent throughput to the gateway in a single node docker environment. Unfortunately, even with ridiculously simple code snippets, the latency is somewhere around 8 milliseconds (by contrast Websockets themselves are capable of a 0.2 millisecond latency). Is there anything in particular that I can do to optimize this other than just programming my own interface directly to the kernels? Thanks, Joseph Murphy