mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.6k stars 386 forks source link

Jupytext in Starboard Notebook #668

Closed gzuidhof closed 3 years ago

gzuidhof commented 3 years ago

Hi Jupytext developers,

I'm building Starboard and its notebook runtime Starboard Notebook. The main difference versus Jupyter is that Starboard runs entirely in the browser: there is no backend server. Python support is possible through Pyodide, see the Python example notebook.

Its current format is based on Project Iodide's percent format, which is very similar to Jupytext's percent format (here's a short discussion on compatability (@westurner for visibility)).

The format currently looks like this:

%% py runOnLoad
import matplotlib.pyplot as plt
plt.plot([i**3 for i in range(20)])
plt.show()

%% py
import pandas as pd
df = pd.DataFrame([1,3,5,7])
df

%% js
console.log("hi");

%% html runOnLoad collapsed
<p>Hello!</p>

I'm looking to transition to a more Jupytext/percent-like representation for (near) compatibility. Especially with Project Iodide no longer being actively developed or maintained this makes more sense.

I think being able to open most ipynb/jupytext files entirely in the browser will be very powerful. Of course the mapping will never be entirely 1-1, so I hope we can discuss what makes sense and you can answer some questions. Here are a few initial questions I hope you can answer:

  1. The cell delimiters are currently not "commented" at all. What comment delimiter makes sense? #? //? Starboard notebooks are truly multilingual: you can mix Javascript and Python and have interop between the two so there is no one "kernel" or "script language". I was personally thinking of just picking #. How would you deal with a corner case like this?

    # %% [javascript]
    const myString = `
    # %% Hello!
    `;
  2. What is the default cell type? Is it inferred from the file extension? If I were to convert from jupytext script to my format should I put this default type in the metadata, or simply make every cell have an explicit type?

  3. Cells have optional titles, could you maybe give an example of when that is used? Can I safely ignore it?

  4. A cell's properties (or not sure what you call them, metadata?) are in key="value" format, how would more complicated values be formatted? Are they always to be interpreted as strings, or does it follow YAML rules? Currently I have only binary flags, how would they be represented?

  5. The metadata in Jupytext is currently a block of YAML at the top, with everything under a jupyter.jupytext key, should I keep the same metadata under the same key for compatibility, or do you foresee Jupytext looking for a starboard.jupytext key?

  6. A bit unrelated and more food for thought: Have you considered allowing the cell type and metadata to be split over multiple lines, something like this? I was considering supporting it before as the line can become quite long and no longer play nice with version control.

    # %% [javascript]
    # %% myValue="Hello!"
    # %% someOtherValue="bla"
    console.log("Hello!");

    Or even full YAML support?

    
    # %%* [javascript]
    my_metadata:
    v: [1,2,3]
    # *%%
    console.log("Hello!");


---

Of course I'm also happy to hear any other tips and thoughts!

My current parser is only [40 lines](https://github.com/gzuidhof/starboard-notebook/blob/master/src/content/parsing.ts#L42-L94) (but gets the job done), I'm not sure how much it will have to grow now.

Thank you in advance :)
westurner commented 3 years ago

I haven't had a chance to respond to these questions yet.

First, there are many parsers and renderers of Jupyter notebooks. Here are a few in JS:

https://github.com/jsvine/notebookjs

https://github.com/jsvine/nbpreview

https://gitlab.com/gitlab-org/notebooklab/

https://github.com/nteract/notebook-preview

https://github.com/nteract/nteract/tree/main/packages/notebook-app-component

https://github.com/microsoft/vscode-python/blob/master/src/client/datascience/jupyter/jupyterNotebook.ts

https://github.com/eclipse-theia/theia

https://github.com/cdr/code-server

These have already solved displaying the notebooks with source code syntax highlighting and MathTeX support.

gzuidhof commented 3 years ago

Thank you for the references!

Definitely helpful, especially to see how they handled LaTeX support using KaTeX instead of MathJax, which I now added to Starboard. I can probably take one of their parsers as a starting point for allowing for .ipynb imports :)

mwouts commented 3 years ago

Hi @gzuidhof , thank you for reaching out! And thanks for asking about what could be a standard text representation for notebooks.... in fact the "Jupytext/percent" format is just an attempt to define a common denominator to the formats explored earlier in Spyder / Hydrogen / VSCode / Pycharm etc....

I think being able to open most ipynb/jupytext files entirely in the browser will be very powerful. Of course the mapping will never be entirely 1-1, so I hope we can discuss what makes sense and you can answer some questions. Here are a few initial questions I hope you can answer:

Well, I think the hard part in the mapping will be the Jupyter-specific parts: what do you want to do with the metadata? With the magic commands? You won't be using a Jupyter kernel to execute the code, right? Other than that I don't expect any difficulty in the file format itself.

The cell delimiters are currently not "commented" at all. What comment delimiter makes sense? #? //? Starboard notebooks are truly multilingual: you can mix Javascript and Python and have interop between the two so there is no one "kernel" or "script language". I was personally thinking of just picking #

I would use the comment char that corresponds to the file extension. That is, if you save the file with a .py extension, use #. And if you save it with a .js extension, use //. Also I'd comment out all the cells which language does not match the file extension.

Now you seem to prefer the percent format, but for a true multilanguage notebook, I'd rather use the .md extension. Have you considered that?

What is the default cell type? Is it inferred from the file extension? If I were to convert from jupytext script to my format should I put this default type in the metadata, or simply make every cell have an explicit type?

Do you mean default cell language ? Yes, that one is infered from the file extension. I.e. a javascript cell in a mostly Python notebook would be represented with a language=javascript metadata.

Cells have optional titles, could you maybe give an example of when that is used? Can I safely ignore it?

Yes you can ignore that. AFAIK it's only used in Spyder

A cell's properties (or not sure what you call them, metadata?) are in key="value" format, how would more complicated values be formatted? Are they always to be interpreted as strings, or does it follow YAML rules? Currently I have only binary flags, how would they be represented?

value is the JSON encoding of the string. I think the best way to understand how the format works is to test with your own Jupyter notebook: call nb=jupytext.read("notebook.ipynb") and then py=jupytext.writes(nb, fmt="py:percent") and look at the py variable.

The metadata in Jupytext is currently a block of YAML at the top, with everything under a jupyter.jupytext key, should I keep the same metadata under the same key for compatibility, or do you foresee Jupytext looking for a starboard.jupytext key?

Well that jupytext metadata is optional, and mostly contain file version information. I think you can ignore that until you see a real need for it.

A bit unrelated and more food for thought: Have you considered allowing the cell type and metadata to be split over multiple lines, something like this? I was considering supporting it before as the line can become quite long and no longer play nice with version control.

Yes, we have, but we have not taken action yet, except for the myst:md format. Would you like to give a try to that format? Also you will see that one has the jupytext metadata at the root of the YAML header.

gzuidhof commented 3 years ago

Thank you so much for the detailed response!

Well, I think the hard part in the mapping will be the Jupyter-specific parts: what do you want to do with the metadata? With the magic commands? You won't be using a Jupyter kernel to execute the code, right? Other than that I don't expect any difficulty in the file format itself.

That will indeed be tricky, the plan is for Starboard not to have any magic commands - they're just Javascript. In conversion from a Jupyter notebook to a Starboard notebook depending on the magic they would simply be ignored. Some are probably simple to "translate" (e.g. %%latex to make it a latex cell).

Now you seem to prefer the percent format, but for a true multilanguage notebook, I'd rather use the .md extension. Have you considered that?

I am looking to settle on a format that changes the content of the cells as little as possible. You can mix some code examples in markdown (with ```) and actual cells, to distinguish them I'd have to add metadata to them (or have special delimiters for markdown cells) - I'd like to keep all metadata and cell information on the cell borders to avoid this.

A markdown format is greatfor displaying in GitHub (and similar tools), but the point of Starboard is that it can be fully functional in any modern browser so this compatability is not as important. An "export as markdown" button makes more sense instead of the main text representation being markdown compatible.

Those are the reasons why I am leaning towards the percent formats. Thank you for all the other answers too, really helpful!


I'm currently brewing this as the format, extension .nb, mimetype application/vnd.starboard.nb

---
global_metadata_at_the_top: "just like in any Jupytext format"
foo: "bar"
---
# %% [python]
print("A simple python cell")
# %% [markdown]
**Markdown** is also wrapped in a cell

# %%--- [javascript]
# yaml_cell_header: "across multiple lines, delineated by special %%--- and ---%%"
# which: "should be compatible with Spyder, VS Code, etc. Although not understood of course"
# indented:
#   looks_like: "this"
# ---%%
console.log("Hello!");

And in a later version I will also support (it makes the parser more complicated to split the keyvalues correctly).

# %% [javascript] key="value"
gzuidhof commented 3 years ago

Thank you again for the pointers, I wrote up a short description of the format I settled on here

mwouts commented 3 years ago

Great, thanks for sharing!