jupyterlab / jupyterlab

JupyterLab computational environment.
https://jupyterlab.readthedocs.io/
Other
14.17k stars 3.39k forks source link

Support notebook format 4.5, adding cell IDs #9729

Closed jasongrout closed 3 years ago

jasongrout commented 3 years ago

We currently support notebook format 4.4. Notebook format 4.5 adds a new id field to cells. This issue is about supporting this newest version of the notebook format.

Related issues:

jasongrout commented 3 years ago

Tagging as 4.0, though it would be great if we released this in the 3.x releases.

jasongrout commented 3 years ago

From https://github.com/jupyterlab/jupyterlab/issues/9064#issuecomment-763416155:

Doing some experiments, it's actually quite easy to accidentally get an invalid notebook 4.5 file. For example, open a notebook 4.5 file in jlab 3 (using nbformat 5.1), run some cells, then saving, gives us a file that claims to be notebook format 4.5, but does not have cell ids, so really is invalid. Perhaps for now, until jlab handles notebook 4.5 format, jlab should make sure that files it saves are format 4.4 or earlier, rather than just keeping whatever format version it has already in the file. (This was mentioned in https://github.com/jupyter/enhancement-proposals/issues/61#issuecomment-672392703)

There already is some code to handle format differences in https://github.com/jupyterlab/jupyterlab/blob/master/packages/notebook/src/model.ts#L242-L243 - perhaps that should handle the case where we have a 4.5 format file.

MSeal commented 3 years ago

@jasongrout When you get to planning for 4.0 could you post updates here on timeline for support for this ticket?

jasongrout commented 3 years ago

If I end up doing this, I'll definitely post here.

jayqi commented 3 years ago

Is it possible for this fix to be higher priority? Per the problem described in #9645, this is very annoying and generates an immense amount of diff noise for every cell of every notebook any time a notebook has to be committed to VCS.

Screen Shot 2021-03-19 at 4 36 35 PM

This was fixed for Jupyter Notebook back in early January. https://github.com/jupyter/notebook/pull/5928

Frankly I consider this a bug rather than an "enhancement" as this is highly undesirable behavior.

jasongrout commented 3 years ago

The best (most helpful) way to make something a higher priority in an open-source project like this is to work on a PR (or sponsor someone to work on a PR).

willingc commented 3 years ago

Frankly I consider this a bug rather than an "enhancement" as this is highly undesirable behavior.

It's unclear to me @jayqi if you are objecting to JupyterLab's current status or to the cell id JEP that was opened in August 2020 and approved in September 2020.

We've been using nbformat version 4.5 without issue. The benefits of cell id have been demonstrated in the JEP.

Following @jasongrout's comment makes sense for open source projects.

jeffyjefflabs commented 3 years ago

I think the objection is that merely saving in Lab causes all the cell ids to change. In the example below (from the binder link in this repo's README), all I did between the two grep commands is press Ctrl-S in the tab where my notebook is open. I suppose a careful reading of the JEP doesn't expressly forbid this, but it does seem to contradict the spirit of e.g. question 6 ("Would cell ID be changed if the cell content changes, or just created one time when the cell is created?" -- "It stays the same once created.").

jovyan@jupyter-jupyterlab-2djupyterlab-2ddemo-2d5evmnslq:~/demo$ jupyter --version
jupyter core     : 4.7.1
jupyter-notebook : 6.1.6
qtconsole        : not installed
ipython          : 7.20.0
ipykernel        : 5.1.4
jupyter client   : 6.1.11
jupyter lab      : 3.0.7
nbconvert        : 6.0.7
ipywidgets       : 7.6.3
nbformat         : 5.1.2
traitlets        : 5.0.5
jovyan@jupyter-jupyterlab-2djupyterlab-2ddemo-2d5evmnslq:~/demo$ grep id Untitled.ipynb 
   "id": "incredible-elevation",
   "id": "rolled-corporation",
   "id": "sharing-buffalo",
jovyan@jupyter-jupyterlab-2djupyterlab-2ddemo-2d5evmnslq:~/demo$ grep id Untitled.ipynb 
   "id": "urban-plenty",
   "id": "configured-messenger",
   "id": "elect-february",
jayqi commented 3 years ago

Yes, @jeffyjefflabs is correctly capturing my intent. I have no problem with the JEP or the merits of having a cell ID—the issue is that the current behavior of Jupyter Lab interacting with the cell ID causes it to change on every save, which causes a lot of diff noise, and as pointed out, does not seem consistent with the general understanding of Question 6 in the JEP.

Frankly I consider this a bug rather than an "enhancement" as this is highly undesirable behavior.

In this quote, I was referring to the fact that this issue is tagged on GitHub with the label type:enhancement that is attached to the 4.0 major release milestone. It's not just that Juptyer Lab doesn't support this new feature of nbformat / the new notebook format, it's that it's is causing an unintended problem for users. In that sense, calling it an enhancement feels like a misclassification that is not recognizing that it's a problem that (feels to me like it) should have higher urgency than a "new feature".

I understand that projects in the Jupyter ecosystem are free and open-source, and everyone should be the change they want to see, so I am willing to take a look at contributing a fix when I have time if someone hasn't already addressed it by then. However, I think there is still some room for improvement here on how this issue is classified and viewed by the maintainers that is in the maintainer-level project management purview.

jasongrout commented 3 years ago

Yes, that is pretty painful :(. For now, downgrading nbformat should be able to work around the problem until JLab gets support for maintaining cell ids. It's slated for 4.0 at the latest, but I'm hoping it can go in 3.1 if it doesn't break APIs.

jasongrout commented 3 years ago

I'm happy to help someone get started on this issue if someone wants to start looking into it.

teucer commented 3 years ago

@jasongrout could you please describe high-level what would need to be done?

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/cell-id-changes-when-notebook-rerun-only-in-jupyterlab/8489/3

matthewfeickert commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

Hi :wave: That was me, and I can say that there isn't necessarily more relevant info there, beyond a Gist and Binder that reproduces the cell IDs changing with each run of the notebook in JupyterLab for

$ pip list | grep "jupyter\|nbformat"
jupyter             1.0.0
jupyter-client      6.1.12
jupyter-console     6.4.0
jupyter-core        4.7.1
jupyter-packaging   0.7.12
jupyter-server      1.5.0
jupyterlab          3.0.12
jupyterlab-pygments 0.1.2
jupyterlab-server   2.3.0
jupyterlab-widgets  1.0.0
nbformat            5.1.2

So you can probably ignore it.

For now, downgrading nbformat should be able to work around the problem until JLab gets support for maintaining cell ids. It's slated for 4.0 at the latest, but I'm hoping it can go in 3.1 if it doesn't break APIs.

That's awesome news. :+1: Thanks! I'll try to find the version of nbformat where this first shows up.

matthewfeickert commented 3 years ago

For now, downgrading nbformat should be able to work around the problem until JLab gets support for maintaining cell ids.

Ah, about as simple as it gets. nbformat v5.0.8, the last (non-yanked) release before v5.1.2, will work as it just removes support for cell IDs here. So

python -m pip install --upgrade "nbformat<5.1.2"

should be sufficient for the time being to avoid the versioning problems :+1:

jasongrout commented 3 years ago

@jasongrout could you please describe high-level what would need to be done?

I haven't dived too much into what would be needed to change in jlab, but here's where I would start:

  1. Update our code understanding notebook structure at https://github.com/jupyterlab/jupyterlab/blob/master/packages/nbformat/src/index.ts
  2. Update https://github.com/jupyterlab/jupyterlab/blob/master/packages/notebook/src/model.ts to have the toJson method make sure it is saving the correct notebook format and metadata, and perhaps some changes where cells are created to generate ids.

I would also try to understand where cell ids are getting dropped by jlab and make sure they were carried through in the model. My guess is that they are getting dropped in one of the above two places.

jayqi commented 3 years ago

My guess is that the PR that fixed the same issue in Jupyter Notebook may be relevant as an example: https://github.com/jupyter/notebook/pull/5928

jasongrout commented 3 years ago

That PR is just a few lines. It would be awesome if this PR was that easy too!