mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.64k stars 386 forks source link

addition of cell id introduced with nbformat>=4.5 to text format #1263

Open itcarroll opened 3 months ago

itcarroll commented 3 months ago

Discussion on #735, after it was closed, pointed to the need for an issue on inclusion of the cell id in jupytext text formats. I can't find one, and I think it's an enhancement worth considering.

Currently, a cell id is preserved in paired notebooks, but there are cases where the paired notebook is not present. Primary among these is when only text formats are held in a git repository. In this case, collaborators that generate notebooks locally from the text format end up with all cells having a different id. I'm interested to know, is there support for directly incorporating the cell id in the "light" format?

The obvious proposal would be to require a start-of-cell delimiter for every cell and include the id. The id is distinct from metadata because 1) it is first and 2) is not a key=value pair (the "=" character is not permitted in a cell id).

The examples would become:

# +b457cb9f-93c0_456a-a652-3f597535aa2d
# This is a multiline
# Markdown cell

# +a99ac56a-3859_4a15-9023-bab26654380f
# Another Markdown cell

# +4e9a328c-7d49_4e7e-9af4-a9f86ccddd14
# This is a code cell
class A():
    def one():
        return 1

    def two():
        return 2
# +3435c495-ba0c_4ca6-8a65-7b3658b66733
# A single code cell made of two paragraphs
a = 1

def f(x):
    return x+a
# +a8345b4b-8282_47fe-96c4-1d2c02bc92ca key="value"
# A code cell with metadata

# +a8345b4b-8282_47fe-96c4-1d2c02bc92ca [markdown] key="value"
# A Markdown cell with metadata
mwouts commented 3 months ago

Hi @itcarroll , thanks for opening this discussion.

Sure, we could do something in that direction. Actually, some of the formats have support for a cell title that dates back to the spyder format. It might make sense to map that to the cell id.

Right now I think the Pandoc markdown format might have support for cell ids, if you want to give it a try, but I understand that you might be more interested into a Python format.

I will have some time to give this a try in two weeks time or later.

mwouts commented 2 months ago

Hi @itcarroll , I have a first draft of this functionality in the attached PR (which contains instructions on how to install the development version).

Would you like to give it a try and let me know what you think?

The new option is not active by default. If you want to use it, you will have to create a jupytext.toml file with that content:

cell_id_to_title = true

You can rename the cell ids as you wish, however the new name must match this regular expression: ^[a-zA-Z0-9-_]+$, otherwise Jupyter won't open the notebook. For convenience, Jupytext will replace spaces with underscores when converting titles to ids (but it won't convert them back).

Let me know what you think!

itcarroll commented 2 months ago

Thanks for the work on this! I'll definitely give it a try and report back, but it will be sometime next week.

mwouts commented 2 months ago

Perfect! No rush, and thanks for suggesting that in the first place - I'm curious to see if/how we can turn this into something usable!

itcarroll commented 2 months ago

This is looking very usable already, although I am encountering an error that only shows up when I've created an .ipynb file from a .py file with jupytext --sync *.py. Trying to open the resulting .ipynb file in JupyterLab gives an "Unhandled error".

[W 2024-09-01 10:01:10.423 ServerApp] Notebook test.ipynb is not trusted
[W 2024-09-01 10:01:10.424 ServerApp] test.ipynb (last modified 2024-09-01 14:01:03.779814+00:00) is more recent than test.py (last modified 2024-09-01 14:00:27.713327+00:00)
[I 2024-09-01 10:01:10.424 ServerApp] Reading SOURCE from test.py
[E 2024-09-01 10:01:10.428 ServerApp] Uncaught exception GET /api/contents/test.ipynb?type=notebook&content=1&hash=1&1725199270416 (::1)
    HTTPServerRequest(protocol='http', host='localhost:8889', method='GET', uri='/api/contents/test.ipynb?type=notebook&content=1&hash=1&1725199270416', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupyter_server/services/contents/handlers.py", line 155, in get
        self.contents_manager.get(
    TypeError: build_jupytext_contents_manager_class.<locals>.JupytextContentsManager.get() got an unexpected keyword argument 'require_hash'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
                 ^^^^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
               ^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupyter_server/services/contents/handlers.py", line 168, in get
        self.contents_manager.get(
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupytext/contentsmanager.py", line 338, in get
        content = read_pair(
                  ^^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupytext/pairs.py", line 127, in read_pair
        in_text = jupytext.writes(notebook, inputs.fmt)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupytext/jupytext.py", line 503, in writes
        return writer.writes(notebook, metadata)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupytext/jupytext.py", line 291, in writes
        if self.config.cell_id_to_title and hasattr(cell, "id"):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'cell_id_to_title'
[W 2024-09-01 10:01:10.431 ServerApp] wrote error: 'Unhandled error'
    Traceback (most recent call last):
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupyter_server/services/contents/handlers.py", line 155, in get
        self.contents_manager.get(
    TypeError: build_jupytext_contents_manager_class.<locals>.JupytextContentsManager.get() got an unexpected keyword argument 'require_hash'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
                 ^^^^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupyter_server/auth/decorator.py", line 73, in inner
        return await out
               ^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupyter_server/services/contents/handlers.py", line 168, in get
        self.contents_manager.get(
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupytext/contentsmanager.py", line 338, in get
        content = read_pair(
                  ^^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupytext/pairs.py", line 127, in read_pair
        in_text = jupytext.writes(notebook, inputs.fmt)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupytext/jupytext.py", line 503, in writes
        return writer.writes(notebook, metadata)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/icarroll/tmp/jupyext-pr/venv/lib/python3.11/site-packages/jupytext/jupytext.py", line 291, in writes
        if self.config.cell_id_to_title and hasattr(cell, "id"):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'cell_id_to_title'
[E 2024-09-01 10:01:10.453 ServerApp] {
      "Host": "localhost:8889",
      "Accept": "*/*",
      "Referer": "http://localhost:8889/lab/tree/test.ipynb",
      "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"
    }
[E 2024-09-01 10:01:10.453 ServerApp] 500 GET /api/contents/test.ipynb?type=notebook&content=1&hash=1&1725199270416 (179348aa56624fa2bfe5663b9cfa432b@::1) 16.07ms referer=http://localhost:8889/lab/tree/test.ipynb
[I 2024-09-01 10:01:38.360 ServerApp] Shutting down on /api/shutdown request.
[I 2024-09-01 10:01:38.361 ServerApp] Shutting down 5 extensions

My test.py file is:

# ---
# jupyter:
#   jupytext:
#     text_representation:
#       extension: .py
#       format_name: light
#       format_version: '1.5'
#       jupytext_version: 1.16.5-dev
#   kernelspec:
#     display_name: Python 3 (ipykernel)
#     language: python
#     name: python3
# ---

# + 4a523f6f-1e2a-41c1-8e56-90c9f559dd1a [markdown]
# Nothing.

My pyproject.toml file is:

[tool.jupytext]
cell_id_to_title = true
formats = "ipynb,py"