mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.58k stars 386 forks source link

Hide inputs, include outputs in Markdown format #220

Open mwouts opened 5 years ago

mwouts commented 5 years ago

The Markdown format for notebooks is a great fit for READMEs on GitHub. For this usage, I would like to be able to include (selected) outputs in the Markdown file. And from time to time I would like to hide a few input cells.

Possible implementation...

choldgraf commented 5 years ago

Just a note that I bet some folks in

https://discourse.jupyter.org/t/generating-reports-for-jupyter-notebooks/279/15

would find this interesting and useful. I was also thinking about improving the story for "one notebook file -> one md file" creation. It seems like one tricky thing about turning certain code blocks / outputs / etc ON or OFF is that this makes two-way sync more difficult.

Do you think there'd be value in building in some functions in jupytext that break two-way synchronization? Otherwise as you mention, you could try keeping track of all the "hidden" pieces by using comments.

The other question is how things like interactive outputs would work in markdown. I guess those would be embedded as HTML snippets?

mwouts commented 4 years ago

Hi, I will soon start thinking of how to best implement this. Currently some ideas I have are listed below, please feel free to discuss those!

Global behavior

Saving outputs

Hiding code I'd like to offer an option to hide one or all code cells. The code cell would obviously remain in the .md document, but in a part of the document that is commented. We should use a standard cell metadata for triggering this, e.g. the tag hide_input as in Jupyter Book.

mattharrison commented 4 years ago

Would love the ability to show output in markdown export (and run doctest on it). I'm authoring a book and this would make be wonderful! :)

mwouts commented 4 years ago

Hello everyone! I have done some research on this subjects and looked into how it would be possible to include (selected or all) outputs in the Markdown file.

I'd follow the default nbconvert's display_data_priority (probably ['html', 'application/pdf', 'svg', 'latex', 'png', 'jpg', 'jpeg', 'text']), and include just one representation of the output, and e.g. drop the text output on plots which looks like <Figure size 432x288 with 1 Axes>, unless the output is required for a visual identity of the notebook (e.g. the JSON data attached to plotly graphs may have to be preserved).

For each output type I plan to offer both inline and include. Include would be the default for images (as this works well in VSCode, PyCharm and GitHub), and inline would be the default for text. For HTML I would prefer to go for include by default, but that won't work on GitHub, so please advice! For Javascript and JSON files I'll choose include by default, as neither include nor inline will work on GitHub (I think, but still have to check, that both forms work in VScode when you turn security settings off).

Now let me describe what I have in mind for each output type

Text

Outputs like

{
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }

could be represented as

```output_execute_result
2
```

similarly to jupyter nbconvert which sets class="output_text output_subarea output_execute_result" on these outputs.

{
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "using print\n",
      "using sys.stdout.write"
     ]
    }

could be represented as

```output_stdout
using print
using sys.stdout.write
```

unless we want to preserve the 'name' and 'output_type' more explicitly. Note that jupyter nbconvert sets class="output_subarea output_stream output_stdout output_text" on these outputs.

Similarly, for outputs on stderr we could use pseudo code blocks starting with ```output_stderr. Again, nbconvert sets class="output_subarea output_stream output_stderr output_text"

I also plan to allow including the output in a .html file, that would be useful especially for long outputs like logs.

Images

By default, images would use the standard Markdown inclusion like

![](notebook_name_outputs/unnamed_code_cell_5_1.png)

Jupytext would know that this is an image output just by matching the image name: if it is in the output folder (by default {notebook_name}_outputs) and if the image name matches {code_cell_name}_{output_count}.{mime_ext}.

Inline images exist in Markdown, and work in VSCode, but are not displayed by GitHub.

![](...)

HTML

By default we could include the HTML inline, like here:

0
0 4

Maybe I should filter out the style section which does not display nicely on GitHub.

I will also offer the option to include the HTML file using <object>:

<object type="text/html" data="notebook_name_outputs/unnamed_code_cell_3_1.html"></object>

or <iframe>:

<iframe src="notebook_name_outputs/unnamed_code_cell_3_1.html" seamless frameborder="0"></iframe>

These kind of work in VSCode (they add an unesthetic extra box), but not on GitHub.

Javascript

By default scripts would be included using <script>:

<script src="notebook_outputs/unnamed_code_cell_12_4.js"></script>

This may work in VSCode, but not on GitHub.

Metadata

I've not discussed metadata here, but we'll need to find a way to store the metadata outputs. Probably using JSON in HTML comments.

Summary

choldgraf commented 4 years ago

Is there a way that, instead of creating our own folders etc, we could piggy-back on the ipynb format for including data/images/etc in the text-file representation?

E.g. if we could just treat the ipynb file as a dictionary of unique cell IDs -> outputs. Then we could reference the cell ID in the notebook and its output from within the text file? I'm not sure how that'd work with something like vscode/atom/etc, but if it were possible I think it'd be a nice way to add outputs to the text files without changing the files too much

mwouts commented 4 years ago

Sure, the .ipynb format is a great container. And we already do something like what I think you suggest (if I understood correctly): when the user pairs a notebook to a .md file, we use the .ipynb file for storing the outputs (and filtered metadata), and the .md file for storing inputs. Is that what you were thinking of?

This is available in Jupyter (pair a notebook to a text format, close it and reopen the text document), and also on the command line (jupytext --update). The limitation is that the outputs are associated to the input text itself, so in case the inputs are severly modified (other than code reformating with e.g. black), the corresponding ouputs are removed.

mwouts commented 4 years ago

Hi there! It has been some time since we last discussed this... and I am still interested!

The problem that I have with this issue is that I would like to include HTML outputs as external files for the sake of version control. Say e.g., if my notebook sample_notebook.md outputs a dataframe in a cell named main_result, the .md file should contain not the actual table, but instead include the file sample_notebook_outputs/main_result_1.html where the table would be stored.

Including external HTML in Markdown files (with either iframe or object) seems to work in some editors like VS Code. However I am looking for a solution that also works on GitHub.

Does anyone know if that is possible? I am aware of https://stackoverflow.com/questions/14951321/how-to-display-html-content-in-github-readme-md, which explains that iframe and probably also object are sanitized out (so not displayed on GitHub), and also of https://github.com/github/markup/issues/346 and https://github.com/github/markup/issues/1159, so the answer seems to be negative, but maybe I am missing something? @inc0, Do you know if there is any way to embed, in a GitHub Markdown file, say, a table from an HTML file in the same repository?

choldgraf commented 4 years ago

I don't believe that this is possible in any kind of generic way - it's definitely not part of the commonmark spec. You can do this in some flavors of markdown - I think SSG projects like Jekyll and Hugo, or MyST support it, but not in a "works in any interface" kinda way. I'd recommend on trying this not with HTML, but instead with the JSON mimetypes that are generally output from running cells. Otherwise you'll have two different kinds of jupyter outputs - ones that work in ipynb, and ones that are HTML-only and meant for use with jupytext. That said, I think this could become complex quickly (my 2 cents is still that if people want outputs, the easiest path forward is to just use an ipynb file...)

mwouts commented 4 years ago

Hi @choldgraf

I don't believe that this is possible in any kind of generic way - it's definitely not part of the commonmark spec. You can do this in some flavors of markdown - I think SSG projects like Jekyll and Hugo, or MyST support it, but not in a "works in any interface" kinda way

Oh that's right! I completely agree, if we ever implement a prototype for a notebook with outputs, we should start with one of these three frameworks. Can you remind me of the links between MyST and the other two? I mean, is Jupyter Book more Jekyll-based or more Hugo-based? What happens when the MyST inputs are compiled into a website? Will that work if I manually insert an complex output (e.g. the data of a plotly plot) into the MyST file?

That makes me think that last year I did some research about what kind of outputs I could include in Hugo, and the result was pretty nice. I could get footnotes, HTML tables, interactive tables, plotly graphs and even linked Jupyter widgets working (but not math formulae). See https://github.com/mwouts/first_steps_with_hugo for the code and https://my-first-steps-with-hugo.netlify.com/ for the corresponding website.

I'd recommend on trying this not with HTML, but instead with the JSON mimetypes that are generally output from running cells.

In the Hugo test, I extracted the JSON data from the notebook output for both the widget and the plotly graph. Both used two different Hugo shortcodes, but maybe indeed we could think of a shortcode that could display any output from a notebook (or, the equivalent for Jekyll, but maybe you have that already in Jupyter Book?).

choldgraf commented 4 years ago

Can you remind me of the links between MyST and the other two?

There aren't any links between them, but they do some similar things. MyST is a different kind of tool than Hugo/Jekyll in that it parses markdown into a "document representation" as opposed to just converting to HTML. For example, it can use Sphinx to handle cross-references, citations, etc. That's what Jupyter Book does (but it doesn't use Hugo/Jekyll for any of this)

psychemedia commented 3 years ago

I was just looking at some intermediate md generated using bookdown and knitr to execute code in an Rmd file, en route to an output HTML file using settings:

output:
  html_document: default
    keep_md: true
    self_contained: true

Rmd, like many of the Jupytext supported formats, does not capture code cell outputs in the Rmd representation.

However, in the intermediate md, I noticed a couple of things:

The ipynb.pub service that @yuvipanda is currently working on allows simple sharing of ipynb files and documents that can be mapped to ipynb using Jupytext.

The rendering of a file I uploaded there suggests that Jupytext doesn't currently recognise the {=html} component (you can download the original file from the ipynb.pub More options menu).

So I note:

And I wonder:

A more direct route would be to extend knitr to allow a 'knit to ipynb` format, but I'm not convinced the RStudio/knitr folk want to tie into the (competing?) Jupyter ecosystem just at the moment, even though they are upping their support for Python with each release of RStudio.

westurner commented 3 years ago

FWIW, ipynb to markdown with output (and some newline stripping) is also super useful for mailing lists and forums.

Is some way to (maybe by default?) exclude base64-encoded content (e.g. _reprpng etc) from the output all that's needed? Data URIs would work for those who want that output in the pretty plaintext output.

Are there escape vulns in\n; such formats; wherein what should be just data is inopportunely executed as code?

(IMO, Jupytext should/could be a core Jupyter dependency and/or pluginified and partially merged into core (with support for optionally including output in at least (MyST,) markdown))

On Tue, Apr 13, 2021, 11:48 Tony Hirst @.***> wrote:

I was just looking at some intermediate md generated using bookdown and knitr to execute code in an Rmd file, en route to an output HTML file using settings:

output: html_document: default keep_md: true self_contained: true

Rmd, like many of the Jupytext supported formats, does not capture code cell outputs in the Rmd representation.

However, in the intermediate md, I noticed a couple of things:

-

code chunks were identified as might be expected:

   # code

-

raw outputs (eg non-kable styled dataframes) were presented in backticks;

kable style cell outputs were rendered as html tables;

as.htmlwidget() outputs were rendered as:

<script ...> etc

The ipynb.pub service that @yuvipanda https://github.com/yuvipanda is currently working on allows simple sharing of ipynb files and documents that can be mapped to ipynb using Jupytext.

The rendering of a file I uploaded there https://ipynb.pub/view/59a8129eb1f88d5f86cb538df7a81fb8896556e6969597b02b5182a2fe8d2702#displayOptions= suggests that Jupytext doesn't currently recognise the {=html} component (you can download the original file from the ipynb.pub` More options menu).

So I note:

  • the intermediate md output from running knitr on Rmd can generate markdown that Jupytext can work with, in large part, to create ipynb representations, albeit with cell outputs rendered as md/HTML cells rather than outputs (unless you invoke various heuristics, such as that an image or a table or unqualified code block immediately following a language qualified cell is a likely cell output);
  • the =html block could presumably be reliably identified as html in a cell output

And I wonder:

  • what flavour of md does bookdown/knitr generate?
  • if Jupytext can support that flavour of markdown, then a route exists to generating ipynb documents potentially with some output cell values defined, from applying bookdown/knitr to an Rmd file, capturing the intermediate md output produced as part of the corresponding HTML publishing workflow, and then converting the md to ipynb via Jupytext?

A more direct route would be to extend knitr to allow a 'knit to ipynb` format, but I'm not convinced the RStudio/knitr folk want to tie into the (competing?) Jupyter ecosystem just at the moment, even though they are upping their support for Python with each release of RStudio.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mwouts/jupytext/issues/220#issuecomment-818842299, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMNS5XOUVL35OJBJIQZ4TTIRRU3ANCNFSM4HHAE7MA .

yuvipanda commented 3 years ago

@psychemedia I was talking to @ttimbers, and she pointed out that even in https://ipynb.pub/view/59a8129eb1f88d5f86cb538df7a81fb8896556e6969597b02b5182a2fe8d2702#displayOptions=, there's only HTML tables under (=html), not plots. So it doesn't actually have all the outputs.

psychemedia commented 3 years ago

Yes and no; all the tables that are prefixed by ## are outputs. The document contains content that is actually:

But the extent to which can tell whether content is "markdown" or "code cell output" is moot.

The {=html} content you can identify as output. You could write a parser to treat ## prefixed content following a code cell as output, but that would be a heuristic and may generate false positives if for some reason your markdown does include ## prefixed items. I haven't checked to see if there are tells in images/charts or kable generated styled HTML output tables (there may be a switch that allows you to add class metadata to a table?) that would allow you to identify those items as generated output content.

rodrigogiraoserrao commented 2 years ago

There doesn't seem to have been any follow up on this, so, to be clear: currently, is there any way for the versions synced with a notebook .ipynb to also include the outputs? I don't even care too much about fancy output formats, only about the regular outputs.

For example, in a Python notebook of mine, I have the following cell:

{
 "cell_type": "code",
 "execution_count": 3,
 "id": "882a1343",
 "metadata": {},
 "outputs": [
  {
   "data": {
    "text/plain": [
     "1030301"
    ]
   },
   "execution_count": 3,
   "metadata": {},
   "output_type": "execute_result"
  }
 ],
 "source": [
  "pow(50 - (-50) + 1, 3)"
 ]
}

and the synced markdown file gets this:

```python
pow(50 - (-50) + 1, 3)


Is there a way for it to also include the result, that in this case was `1030301`?
jgunstone commented 2 years ago

hi - @RodrigoGiraoSerrao

my understanding is that this isn't currently possible within jupytext (though it is a feature that i would love to see).

you can achieve what you are asking for with nbconvert:

jupyter nbconvert --to markdown mynotebook.ipynb

but you need to manually convert to markdown / keep up-to-date with your notebook / script

mwouts commented 2 years ago

Hi there, I have opened another issue for text notebooks with outputs at #951, that might be easier to implement than this one.

My plan is to add the support for outputs in the "percent" format for text notebooks (e.g. scripts rather than markdown, seems easier because we've more freedom on how outputs can be coded).

I have coded a proof of concept at https://github.com/mwouts/nbpercent/, that seems feasible - the next step will be to find one or more sponsors for the project! (and if you want to get updates on this, please subscribe to #951)

westurner commented 2 years ago

Future concerns in re: "polyglot" notebook cell language metadata:

"Allowing multiple languages in one notebook" #2815 https://github.com/jupyterlab/jupyterlab/issues/2815#issuecomment-907655683 :


"How to implement LSP for a multi-language kernel (SoS)?" jupyter-lsp/jupyterlab-lsp#282

Is this fair?: Any sufficient solution for polyglot kernels must install with just pip install jupyterlab.

What are the existing polyglot kernel approaches?

Do any require Apache Arrow as a kernel dependency for inter-language intercell data exchange?

On Thu, Apr 21, 2022, 9:49 AM Marc Wouts @.***> wrote:

Hi there, I have opened another issue for text notebooks with outputs at

951 https://github.com/mwouts/jupytext/issues/951, that might be

easier to implement than this one.

My plan is to add the support for outputs in the "percent" format for text notebooks (e.g. scripts rather than markdown, seems easier because we've more freedom on how outputs can be coded).

I have coded a proof of concept at https://github.com/mwouts/nbpercent/, that seems feasible - the next step will be to find one or more sponsors for the project! (and if you want to get updates on this, please subscribe to #951 https://github.com/mwouts/jupytext/issues/951)

— Reply to this email directly, view it on GitHub https://github.com/mwouts/jupytext/issues/220#issuecomment-1105234384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMNSZBDEDCLZHVAMX2OC3VGFMFHANCNFSM4HHAE7MA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mfhepp commented 1 year ago

Cross-referencing https://github.com/gpoore/codebraid/issues/32#issuecomment-1257273636 as a pretty straightforward way will be adding Codebraid classes to code blocks and then running Codebraid over the Markdown generated by Jupytext - this should do the trick in many use-cases.

sergei-mironov commented 1 year ago

Hi all, I am developing a tool named LitREPL which (a) provides a Vim plugin for the interactive evaluation of code cells in Markdown documents in the style of Jupyter but not strictly depending on it, and (b) provides a Linux command-line ulility for the same task, which could be used by other editors.

Thus, I am very interested in adding support for Jupyter result sections to the Markdown documents. So far LitREPL does support the following simple format

``` python
print("Hello World!")
Hello World!


I would be very glad if Jupytext would be able to produce Markdown documents formatted in this way (up to the tag names which are of cause discussable). I would be also glad to see empty result sections at least! 
fperez commented 1 year ago

FYI - During last week's Jupyter Notebook format workshop, there was a lot of discussion along these lines. I sadly couldn't participate, but there's a set of notes here that might be of interest to folks on this issue.

Edit - that hackmd doc should become a proper JEP shortly, and there's additional discussion over at MyST.

sergei-mironov commented 1 year ago

Hihi. Are there any news on this?

westurner commented 1 year ago

There's a PR:

```{jupyter.code-cell}

```{code-cell}

# Maybe also, like the percent format:

```%% attr=value


There's already somewhat wide IDE of support for the `percent` `%%` format:
- https://github.com/iodide-project/iodide/issues/2942