ricklamers / gridstudio

Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.
GNU Affero General Public License v3.0
8.88k stars 1.5k forks source link

Securing the Python execution by using Jupyter Notebook kernels in the background? #68

Open 1kastner opened 5 years ago

1kastner commented 5 years ago

First of all thanks a lot for the great contribution! I loved reading the hackernoon article and there are really great examples!

I thought about whether maybe executing the Python code in a Kernel as they are used for Jupyter Notebooks or a JupyterHub could be an alternative to your current approach. One could rely on some existing network protocols with a bit of security. And I hope error messages would show up, one issue I had before when I tried to import matplotlib in the Code widget (instead it was completely silent). Did you consider this and could it be interesting? In that case I could give it a try and create a pull request for it. Or did you have some other design principles in mind which would be harmed?

ricklamers commented 5 years ago

@1kastner I will have a look at the Kernel interface as used by the Jupyter project. It might make sense for Grid Studio and I think it's a great suggestion.

It will probably have a relatively big impact the architecture of the project. If you're interested you could look into it yourself and come up with a proposal on how to use it.

The idea currently is to capture any exceptions in the Python runtime that is attached to Grid Studio and also print those errors and show them in the Python I/O panel in the browser.

If you run some invalid Python code like abc = print( you should already be seeing errors in Grid Studio instead of silent failure.

Could you share the exact scenario/code that caused a silent error? Would be interesting to see if that could be avoided in the current set-up at least for the time being.

1kastner commented 5 years ago

I will have a look and return to you later. One great source I found of somebody who has already dug a bit into the specs is this project: https://github.com/yuvipanda/hubtraf/ - this could be some help for the initial implementation.

1kastner commented 5 years ago

My first attempt now is to re-write https://github.com/ricklamers/gridstudio/blob/master/grid-app/python/init.py in the following way:

1) I will first separate the "sheet" library into another file and add unit tests for that

2) I will spawn a Jupyter Notebook in the same docker container

--> Later this could be replaced by a JupyterHub in a separate docker container. For a single-instance usage that'd be an overkill. For a multi-user setting a JupyterHub could be really great because of a well-designed user separation and kernel separation!

3) Make the Jupyter Notebook load the sheet library. Then replace the exec invocations with Websocket communication to the Jupyter Notebook.

Steps 1 and 2 would be implemented at their place. Once all steps work as expected, one could think of moving up the implementation of step 3 one level to https://github.com/ricklamers/gridstudio/blob/master/grid-app/python.go . I prefer to move that part to another refactoring iteration.

1kastner commented 5 years ago

First steps I have done at https://github.com/1kastner/gridstudio/tree/extract-sheet-library/grid-app/python - still some way to go, especially regarding testing!

1kastner commented 5 years ago

What first I thought to be a silent fail was when I executed a simple example taken from https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/simple_plot.html

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

fig, ax = plt.subplots()
ax.plot(t, s)

ax.set(xlabel='time (s)', ylabel='voltage (mV)',
       title='About as simple as it gets, folks')
ax.grid()

plt.show()

Because of the refactoring I started, now I know that I need to invoke the show() method that you have already prepared in the init file. What made me wonder was that the plot dissappeared into nowhere but I guess it is difficult to replace the matplotlib show function with the one you have already prepared, especially when people re-import libraries which have been prepared in the init file. So in fact it was a mistake of usage, not a silent fail. This code worked perfectly:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

fig, ax = plt.subplots()
ax.plot(t, s)

ax.set(xlabel='time (s)', ylabel='voltage (mV)',
       title='About as simple as it gets, folks')
ax.grid()

show()

So that'd be fine for the meantime. Anyhow, if we push forward the approach to use Jupyter Notebooks, then in future the matplotlib plot function should work as expected again, since there is more than one backend that already supports web frontends, see https://matplotlib.org/3.1.1/tutorials/introductory/usage.html or the inline backend.

ricklamers commented 4 years ago

@1kastner just wanted to let you know I really appreciate the effort but unfortunately didn't have and won't have the time to look at this as I'm working on a different project that's taking all my time for the forseeable future.

1kastner commented 4 years ago

That's fair enough!