vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
271 stars 45 forks source link

A Jupyter kernel for SoS #136

Closed BoPeng closed 8 years ago

BoPeng commented 8 years ago

Encouraged by the success of sos_magic ipython extension, I think we can go further and implement a Jypyter kernel for SoS. This will allow much easier writing of SoS scripts because the notebook can be ru in place and saved in .sos format.

The Jupyter webpage has all the needed information and plenty of examples.

BoPeng commented 8 years ago

Try this

  1. python setup.py install to install sos as well as jupyter kernel
  2. ipython notebook select new sos (or jupyter qtconsole --kernel bash)

Type

a=1
b='${a+100}'
b
sosdict
%sosdict 
run:
    echo '${b}'

magic

%%sos workflow -v3 -p

etc should also be usable. %%sospaste does not exist, but should not be needed in notebook mode.

Finally, download the file as .sos.

It should work, but standard output from run is not returned. Also expect bugs because not all errors are handled.

gaow commented 8 years ago

I cannot find "new sos" on the jupyter page though. I selected new python3 and load_ext sos_magic, but none of it from the "Interactive SoS" tutorial worked on the jupyter notebook (they do work on ipython). There must be some settings I messed up. I purged earlier today my ipython and reinstalled it along with jupyter ... will have to see what's going on. But yes I'm starting to use ipython for debugging which is pretty handy indeed.

Can we make the IPython / Jupyter related features available only via a switch in setup.py, e.g., python setup.py install --notebook_support? On the cluster the new SoS version failed right away. I imaging people may need the interactive feature only on their desktops for debugging and tuning, but on clusters they'll not want to install IPython / Jupyter. Shall we by default python setup.py install on the the basic SoS features and make these visualization and interactive features separate switch?

BoPeng commented 8 years ago

I have fixed the issue with ipython installation. Basically the sos kernel will be installed only if ipython is available.

ipython and sos kernel are two different, though related things.

  1. ipython magic is an ipython extension. The underlying system is python, and we are running everything under a sos magic within sos. As you can see from the example, this system allows you to use pure python, shell commands, and sos altogether, which can be very handy.
  2. sos kernel is another beast. If is a pure SoS environment so everything is handled by the kernel. We do have magic (around 7pm today) but that is processed by the SoS kernel, not the ipython kernel. There are two ways to use this mode.

    2.1 ipython console --kernel sos and ipython qtconsole --kernel sos. This is very similar to ipython with sos magic, but it does NOT have pure python, and does NOT have shell magic (ls, dir etc). I tried to make it as similar to ipython magic as possible, so basically it is a ipython magic mode without sos magic word, and without other magics.

    2.2 ipython notebook is the real game changer here and you must have under estimated its potential. Because we are at full control of the kernel, we can display whatever things we want right after the execution of each cell. For example, we can

    • display input, output files of the step, this is trivial
    • display all the report right away
    • examine the output file list and visualize whatever visualizable. For example, if a user outputs a jpg file in output, we can display it immediately. If a user outputs a csv file, we can display the first few rows of it as a table... In the end, users get the results right in the notebook.
gaow commented 8 years ago

Thanks for the clarification. So I see one is an extension like some lisp extension to Emacs that interacts with the main program to provide something handy. I take that SoS kernel and Jupyter kernel for SoS means the same here and by deriving ipykernel you are basically building an interactive interpreter for SoS as a language, as of ipython to Python! Looks like I should stick to the Jupyter kernel not the sos_magic for ipython, though. I guess sos_magic was a successful experiment that led to the SoS kernel implementation.

This is wonderful! I hope I can get Jupyter interpreting SoS. Somehow it does not work on my office computer for now. Will try again tomorrow.

BoPeng commented 8 years ago

The latest version should work. If you run python setup.py install and see the last line IPython kernel named "sos" is installed., the kernel should be installed. Using

$ jupyter kernelspec list
Available kernels:
  python3    /Users/bpeng1/bin/anaconda/lib/python3.5/site-packages/ipykernel/resources
  sos        /usr/local/share/jupyter/kernels/sos

to verify. Then if you start

jupyter notebook (or ipython notebook)

you should see a new button to the right and be able to select SoS. Then you can type

a=1
a

Right now syntax highlighting does not work and I am not sure why, of course customize output would be a long process.

gaow commented 8 years ago

I see ... i used your tips to verify and confirmed the failure. I then raised the error message from setup.py:

error: [Errno 13] Permission denied: '/usr/local/share/jupyter'

which is weird because my jupyter is in my conda. I'll google and see if there is a way to fix this because I don't want to mess with system Python.

BoPeng commented 8 years ago

I suppose you can simply put your conda path before /usr/local/bin.

On Thu, May 5, 2016 at 11:13 PM, gaow notifications@github.com wrote:

I see ... i used your tips to verify and confirmed the failure. I then raised the error message from setup.py:

error: [Errno 13] Permission denied: '/usr/local/share/jupyter'

which is weird because my jupyter is in my conda. I'll google and see if there is a way to fix this because I don't want to mess with system Python.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/BoPeng/SOS/issues/136#issuecomment-217343970

gaow commented 8 years ago

That's the setup now:

/opt/miniconda3/bin:/home/gaow/bin:/opt/mosek/7/tools/platform/linux64x86/bin:/home/gaow/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

I'm googling. ...

BoPeng commented 8 years ago

No idea how to write a codemirrow mode for SoS, and how to install it with jupyter...

Here is the ticket https://github.com/jupyter/help/issues/35#issuecomment-217293572 Here is some sort of example for SAS https://github.com/sassoftware/sas_kernel/tree/master/sas_kernel/nbextensions

The Jupyter guy was really helpful so I now have a more or less working lexer. A complete lexer is much harder to get and I would wait till we have some javascript expert before we tackle it.

BoPeng commented 8 years ago

Documented at https://github.com/BoPeng/SOS/wiki/Interactive-SoS , we should perhaps separate it into two documents later.

gaow commented 8 years ago

I was just reading this blog post:

http://www.r-bloggers.com/why-i-dont-like-jupyter-fka-ipython-notebook/

A lot of them were also the reason I did not use jupyter for real. I think several points he mentioned there can be addressed by repeatedly exporting to sos format and reloading it. This procedure keeps the notebook neat.

So far I do have one comment though: do you think it is good idea to re-number the cell block ID during export. In exploratory analysis the block ID can go discontinuous due to changes to codes. In the converted SoS I think it looks better if we can sort then re-assign the ID's starting from 1 to N.

BoPeng commented 8 years ago

I agree with all the points of the post and this is why I have been switching between several methods. The notebook format is good at saving and previewing results, not for creating coherent flow of analysis. Before we can create our own Rstudio, this is the best platform I can find to achieve these... I suppose we can advertise the notebook feature as the notebook stage of SoS where we run the analysis interactively, preview and save result, before we convert the notebook to real workflows.

BoPeng commented 8 years ago

To answer your question, the index is not that useful for interactive use so the current implementation tries to keep the information as much as possible so that the exported .sos file can be converted back to .ipynb. Also, the cell index is saved as special comments so users can always use [index] in the cells. When it is ready to convert the notebook to real workflow, users would have to add [] anyway.

gaow commented 8 years ago

I agree. It was just a bit bothering when I tried to export to sos and push to git repo for version control purposes. The change log is always longer due to these nuisance comment numbering.

gaow commented 8 years ago

Well I still think we should reorder and rename those code block names. Here is a minimal example why this is necessary. This is a script I get from sos convert on a sos notebook I have:

#!/usr/bin/env sos-runner
#fileformat=SOSNB1.0

#cell code 1
%use python

#cell code 2
a = 1

#cell code 3
b = 1

#cell code 6
c = a + d

#cell code 5
d = a + b

#cell code

Notice that cell code 6 precedes cell code 5 because I went back to edit cell 4 which uses variable from cell 5 and it worked; only that the cell is renamed to 6. This is one of the bad things about Jupyter that blog post discussed.

Now, if I use sos convert test.sos --notebook test.ipynb I'll get a sos notebook and if I click rerun it will fail because it will simply ignore the ordering of the cell blocks.

So the point here is that Jupyter does not even care about cell numbering when it reruns. That perhaps suggests we should not care either and we should thus re-order and re-number in the exported SoS script so that it will convert to usable SoS notebook

BoPeng commented 8 years ago

Are you suggesting that we reorder the cells according to execution number, namely saving your code as

#cell code 5
d = a + b

#cell code 6
c = a + d

Do you also want to reset the indexes to

#cell code 2
d = a + b

#cell code 3
c = a + d

if there are no 2 and 3 because of rerun?

This might not be what users want though because I might have a sequence of steps in the order of workflow but execute some of the steps back and forth.

Anyway, there is also a possibility to do a more thorough conversion, that removes % magic, adds [] section head, etc, so we have a chance to allow options from command line sos convert --sos.

gaow commented 8 years ago

Yes I think ordering is at least necessary; otherwise the notebook will not be usable when converted back. Resetting index to continuous numbers will not do any harm but will make the output neater, so we may as well do that.

I might have a sequence of steps in the order of workflow but execute some of the steps back and forth.

Right ... I can imaging for example one wants to make a plot and later wants to adjust the coloring etc. By doing that he'll reset the cell number from python notebook, and if we reset it for him in the exported SoS, then the script will still work except the plot will not show in the original position.

There does not seem to exist a happy solution unless we use a complicated way to manage. I guess we should just ask users to behave. SoS makes Jupyter a lot better to the extend that I'm willing to use it now over emacs + ess (I do not use RStudio). But this is only because the notebook allows mixture of language (I'd still prefer ESS over R notebook). Hopefully Jupyter will keep improving. I'll yet have to explore and exploit more % and ${} features :)

BoPeng commented 8 years ago

Suggest some syntax then, perhaps

--reorder             # if set, following execution order
--reset-index         # if set, reset to 1, 2, 3, 4, 5 following cell or execution order
--no-index            # no index at all, cells will be merged (import will be separated by header)
--add-header          # add header if there is no header
--remove-magic        # remove magic, cell can be unusable because of pure R code etc.

There can also be

sos convert notebook.ipynb --notebook newnotebook.ipynb [options]

which is a shortcut to convert generated .sos back to .ipynb.

gaow commented 8 years ago

Good idea! We'll then have more controls over the outcome. I think it is very helpful to properly convert between sos and ipynb. That eliminates a lot of shortcomings pointed out in that post.

BoPeng commented 8 years ago

It is easy to implement. I will submit a scratch patch and leave the rest to you. I have a deadline to beat.

On Wed, May 25, 2016 at 7:35 PM, gaow notifications@github.com wrote:

Good idea! We'll then have more controls over the outcome. I think it is very helpful to properly convert between sos and ipynb. That eliminates a lot of shortcomings pointed out in that post.

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/BoPeng/SOS/issues/136#issuecomment-221747518

gaow commented 8 years ago

Great! Would be happy to take over if you could submit a draft patch. I'd like to use these features so I'll do it later this evening after you submit it. Good luck with your deadline! I too have some but I'm stuck ... hopefully these exploratory analysis with sos notebook will shed some lights to it :)

BoPeng commented 8 years ago

Just submitted, the patch should be easy to follow with possible bugs/typos. You of course do not have to implement all the options, just the ones that you need now. The SoS_Exporter is super easy to work with. I suppose you will need to read all the cells to an ordereddict or list or something, apply the actions before your save them altogether.