python kernel parallel requests

svenkreiss / databench

Data analysis tool.

http://databench.trivial.io

MIT License

84 stars 13 forks source link

python kernel parallel requests #3

Open hussainsultan opened 9 years ago

hussainsultan commented 9 years ago

Thanks for a great library. It does make analysis to visualization much easier. When using python language kernel(with subprocess), it waits for finishing the request for one client before it starts the next clients request. It seems like the message is received while the process just waits. Just wanted to get your thoughts on how would you handle multiple requests at the same time with different language kernels? Creating a new subprocess for each browser connection with new zeromq connection? The subprocess in this case could also be an IPython kernel on remote machines.

svenkreiss commented 9 years ago

Good question. In the databench_py folder, there is a singlethread submodule with the intention that a multithread submodule could be created next to it. That didn't get implemented yet. I haven't thought in detail how to do it exactly. I guess it could be similar to the singlethread module with a modified run_action() in the Meta class. I am open to suggestions. How to handle the emit() function and what the expected behavior for state in the Analysis is (state probably wouldn't be shared) is not clear to me. That's why I put that of so far.

Your suggestions about dynamically instantiating language kernels for every new connection is also interesting. The state of an analysis wouldn't be shared across connections. So there are definitely use cases where this is unwanted, so it would have to be an option only. Something to think about.

hussainsultan commented 9 years ago

For a dynamic MUX type architecture where a new language kernel is instantiated and can directly hook with the server over ZMQ for a websocket bridge, i think IPython may be very interesting to look at.

I played around with IPython and was able to interact with it in an App/Analyses context. Here is a simple application created by an IPython author. Some thoughts:

On the backend, IPython uses tornado to create ZMQ/Websocket bridges. For a minimal hook into IPython, we may want to use use Tornado instead of Gevent. (http://ipython.org/ipython-doc/2/development/messaging.html). However, Flask could still be used for auth/routing and templating. More reading on [IPython messaging]
For front-end, current databench.js will have to be modified to work with ipython kernel machinery
The current ws_serve and zmq bridges will need to be factored out to create to fork new kernels and manage them.
Furthermore, there are security concerns with exposing IPython kernels on the web - perhaps docker or other VM are needed

All in all, this calls is a significant architecture change and i am not sure if it fits into the vision of what you are trying to accomplish. One benefit of using IPython is that they are moving to a polyglot architecture with many different language kernels already implemented e.g. Julia, R etc. Let me know your thoughts. I will add a simple working demo using IPython soon.

svenkreiss commented 9 years ago

Thanks for your thoughts on this. IPython used to be a monolithic package. With the split in Project Jupyter, it could be nice to evolve databench and make use of the components of Jupyter. Some thought is definitely required here, but I could see it evolve towards a module that sits parallel to the Jupyter Notebook; i.e. databench becoming another front-end to the Jupyter tools.

One core feature of databench has been the architecture of having the backend send messages that are not in direct response to a request. That allows a stream of messages being emitted by a single process as it progresses through a calculation. I don't know whether/how that is supported with IPython/Jupyter.

Moving to Tornado has been in the back of my head for a while (especially for Python3). I don't have much experience with it though.

And yes, the changes would be significant, but I am happy to discuss long term goals.

hussainsultan commented 9 years ago

Thanks. You could still have the kernel emit events/messages to the front-end. Some of the current IPython machinery could be monkeypatched to do this. It already does that with kernel heartbeat information and uses a separate channel Perhaps we could latch on that.

One of the reasons i think this is cool is that you don't always want to send out a notebook for a business review and an app that databench provides could be directly based on a notebook minus the code and details. This could also be a deployed as a meaningful way to re-run analyses in a standard way.