jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.74k stars 568 forks source link

Conversion back from executable script to ipynb notebook #452

Open ghost opened 8 years ago

ghost commented 8 years ago

Is there a way to convert an executable (Python) script to notebook? Assume a suitable format for embedding cell meta-data in executable scripts. Something like conf file formats may be:

# [code]
...
...
# [markdown]
...
...

Or something less intrusive using markers like consecutive comment lines, consecutive blank lines etc. I think this feature will help in increasing the adoption of Jupyter notebooks.

takluyver commented 8 years ago

We don't have a tool to do that, but it shouldn't be too hard to write a script to do it if you want to. You can use the nbformat library to construct a notebook.

clouds56 commented 8 years ago

But it's hard to convert the python script file generated by nbconvert back to notebook.

e.g. consider the flowing code, which is an valid output from nbconvert

# 1. this is a markdown comment

# In[1]:

some_python_code()

# 2. this is a comment in code cell

# In[2]:

you_could_not_the_different(markdown_comment, comment_in_code)

could we have some special format for markdown cell? something like

# Markdown:
# this is a markdown cell

# this is comment in code
mpacer commented 7 years ago

@clouds56 Including additional text would break expectations about what the exporter is doing. However, we could programmatically use triple quoted string literals to indicate Markdown, which are less commonly used (except for docstrings).

It'd be easy enough to detect docstrings (as they must follow function definitions), and we could expect that people hold to the convention that any in-line string literals that not assigned, and therefore are acting as comments (which is technically frowned upon anyway) should be indented to align with the indentation level of the lines immediately preceding it.

This wouldn't be perfect, but I think it would hit the 80/20 solution. Does that sound like a good approach @takluyver @Carreau @minrk?

mpacer commented 7 years ago

Note also, that if #507 is merged as is (to solve #503), then this will currently lose that information.

What we could do instead is add a line comment immediately after the magic command rewriting that would allow us to recover the original command.

mpacer commented 7 years ago

Ok now #507 is more compatible with reconstructing the notebook, it will just have to know the convention that in the case of matplotlib magic commands (in scripts get_ipython.magic(matplotlib …)), if there is a comment immediately following that begins with # nbconvert removed: … it will need to replace the content that follows back into the original command.

A useful note for implementation: currently, markdown blocks can be distinguished from the lack of a # with a new line between blocks that are prefixed by #.

I don't know the full variety of cases where nbconvert might want to deliberately change the content of code to be more amenable to processing as a script, but there's probably a more general form for this kind of convention that could work well.

@fperez @carreau which other ipython magics would be inappropriate for exporting to a script "as is"? Are there any others, or is this a unique feature of the matplotlib magic, where keeping the inline and notebook backends will always cause problems in a script (since they're designed to be used in a gui). My instinct is that something that with no functionality in a script (i.e., doesn't do anything either helpfully or breakfully) can be left unchanged so that the back conversion is possible.

Relatedly: Is there anything that is the reverse of the IPythonInputSplitter that will take a pythonic version of an ipython command and recreate it as a magic?

westurner commented 7 years ago

I just found py2nb:

https://github.com/sklam/py2nb "Python script to Jupyter notebook converter"

Uses python tokenize (builtin tokenizer library) for tokenization. String literals with triple quote at column zero are converted into a comment token with special <markdowncell> and <codecell> to feed into the python importer in IPython version 3. The processed tokens are untokenized using the tokenize module so that untouched line looks exactly the same as the input.

mwouts commented 6 years ago

Hi everyone, I have recently developed jupytext, a collection of text to Jupyter notebook converters, with a plugin that allows to edit and run python files as notebooks in Jupyter.

My approach was to use almost no explicit cell markers - just enough of them to preserve the notebook structure on round trip conversion. That allows to generate very natural python scripts, and reversely, to open any script as a notebook.

Would you like to give it a try it and provide feedback? Your feedback on the format is very welcome.

remykarem commented 5 years ago

Hi guys, I recently published a Python package in PyPI called p2j that creates a Jupyter notebook .ipynb from a Python source code .py. On the command line, run

pip install p2j

and then run

p2j mycode.py

and it will generate a mycode.ipynb. Example of the Jupyter notebook generated:

img_8334

Submit a PR or give me a feedback!

PyPI: https://pypi.org/project/p2j/ GitHub: https://github.com/raibosome/python2jupyter

humbleself commented 5 years ago

yes, i tried this but gave me this error:

File "", line 1 p2j myprojectfinalfinalpr.py ^ SyntaxError: invalid syntax

remykarem commented 5 years ago

@humbleself It seems that you are running it as a script or an a Python interpreter. You should run it on the command line.

shivam13juna commented 5 years ago

Hi guys, I recently published a Python package in PyPI called p2j that creates a Jupyter notebook .ipynb from a Python source code .py. On the command line, run

pip install p2j

and then run

p2j mycode.py

and it will generate a mycode.ipynb. Example of the Jupyter notebook generated:

img_8334

Submit a PR or give me a feedback!

PyPI: https://pypi.org/project/p2j/ GitHub: https://github.com/raibosome/python2jupyter

Brilliant job pal, worked beautifully. Thanks a lot!

humbleself commented 5 years ago

Thank you, I will try that soon

On Tue, 16 Jul 2019, 5:40 am SHIVAM PRASAD, notifications@github.com wrote:

Hi guys, I recently published a Python package in PyPI called p2j that creates a Jupyter notebook .ipynb from a Python source code .py. On the command line, run

pip install p2j

and then run

p2j mycode.py

and it will generate a mycode.ipynb. Example of the Jupyter notebook generated:

[image: img_8334] https://user-images.githubusercontent.com/11023859/53932937-21291e80-40d7-11e9-8802-10cb9d5d6a72.JPG

Submit a PR or give me a feedback!

PyPI: https://pypi.org/project/p2j/ GitHub: https://github.com/raibosome/python2jupyter

Brilliant job pal, worked beautifully. Thanks a lot!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyter/nbconvert/issues/452?email_source=notifications&email_token=AI6FVB5GEZPUBTI26AGCMULP7VGM7A5CNFSM4CTYVRQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ7VQPQ#issuecomment-511662142, or mute the thread https://github.com/notifications/unsubscribe-auth/AI6FVB4XYASUKVVTZWTWWEDP7VGM7ANCNFSM4CTYVRQQ .

t-makaro commented 5 years ago

We could implement this feature in nbconvert with a notion of "Importers". An exporter converts a notebook_node into another format. We could use an "Importer" to take another file format and convert it to a notebook_node. The importer could be called by an exporter when the exporter calls Exporter().from_filename() to get a notebook_node.

mgeier commented 5 years ago

We could use an "Importer" to take another file format and convert it to a notebook_node.

I've also noticed the lack of an "importer" extension mechanism.

I've worked around this be creating a "mixin" class and applying that to all the existing exporters: https://jupyter-format.readthedocs.io/en/latest/api.html#Exporters-for-nbconvert

This is of course ugly and not really scalable, but it somewhat seems to work. An official extension mechanism would make this much nicer.

scottcode commented 4 years ago

A friend of mine (@kopptr) created a tool for conversion to and from script form: https://github.com/kopptr/notebook-tools

In script-form it uses special comment markers (#>) to demarcate cell boundaries, and triple-quoted strings to denote markdown.

westurner commented 4 years ago

How do these formats compare to the light, nomarker, percent, hydrogen, and sphinx-gallery formats that have already been added to jupytext?

https://jupytext.readthedocs.io/en/latest/formats.html#notebooks-as-scripts

Jupytext supports importing from other notebook formats to .ipynb from the command line and also when they're already paired.

https://jupytext.readthedocs.io/en/latest/paired-notebooks.html :

Jupytext can write a given notebook to multiple files. In addition to the original notebook file, Jupytext can save the input cells to a text file — either a script or a Markdown document. Put the text file under version control for a clear commit history. Or refactor the paired script, and reimport the updated input cells by simply refreshing the notebook in Jupyter.

https://jupytext.readthedocs.io/en/latest/using-cli.html :

jupytext --to py:percent notebook.ipynb         # convert notebook.ipynb to a .py file in the double percent format
jupytext --to py:percent --opt comment_magics=false notebook.ipynb   # same as above + do not comment magic commands
jupytext --to markdown notebook.ipynb           # convert notebook.ipynb to a .md file
jupytext --output script.py notebook.ipynb      # convert notebook.ipynb to a script.py file

jupytext --to notebook notebook.py              # convert notebook.py to an .ipynb file with no outputs
jupytext --update --to notebook notebook.py     # update the input cells in the .ipynb file and preserve outputs and metadata

jupytext --to md --test notebook.ipynb          # Test round trip conversion

jupytext --to md --output - notebook.ipynb      # display the markdown version on screen
jupytext --from ipynb --to py:percent 

Does the importer functionality in jupytext (by @mwouts) solve for this issue?

westurner commented 4 years ago

Do all "notebooks as scripts" formats lossily discard [binary] cell outputs?

"Jupyter and GitHub - alternative file format" https://discourse.jupyter.org/t/jupyter-and-github-alternative-file-format/4972/99

"Proposed-JEP: Investigate alternate, optional file formats" https://discourse.jupyter.org/t/proposed-jep-investigate-alternate-optional-file-formats/5073

mwouts commented 4 years ago

Thank you @westurner for the citation.

Jupytext may be able to parse scripts generated with jupyter nbconvert --to script. But it will certainly work better with the formats that it explicitly supports, and for which the round trip are well tested.

People reading this thread may like the percent format. That format is more explicit (cells are marked with # %%), and it is compatible with many IDEs.

Jupytext is available as a CLI, but maybe it is even more convenient to use it directly within Jupyter. With the Jupytext plugin for Jupyter, you will be able to

  1. open scripts or Markdown files as notebooks, run them as notebooks, etc (outputs are lost when you reload)
  2. turn some of your existing notebooks (or scripts) into paired notebooks (an editable text version + a classical .ipynb file).

Do all "notebooks as scripts" formats lossily discard [binary] cell outputs?

At the moment, yes. At least for the Jupytext formats (unless, obviously, when they are paired to an .ipynb file). The only alternative format that I am aware of, which preserves outputs, is Pandoc's Markdown representation of Jupyter Notebooks.

scottcode commented 4 years ago

@westurner @mwouts Thanks for sharing more details about jupytext. Looks very handy. I like the idea of the percent format, too, for its compatibility with Spyder. I also like how many (or all?) of the formats allow for cell metadata/tags, which enables other tools to leverage it (e.g. in nbstripout cell metadata for keeping output).