Create cell id based on based on its content (source)?

lexnederbragt commented 3 years ago

I work a lot with a markup language called DocOnce and its command line conversion tool (https://github.com/doconce/doconce). One of the possible output formats for doconce is Jupyter notebooks.

With the recently added cell ids, each time doconce generates a new version of a notebook from a doconce source file, all cell ids change. This is OK in itself, however, it causes a lot of 'noise' when having the notebook under version control and looking for differences between versions.

Would creating the cell id based on the cell's content be an option for doconce? In practice it would be generating a hash from the text in the (json) source field, rather than a random hash. When generating a new notebook, cells that do not change because the doconce source for it did not change would again get the same id. Cells that changed would get a new ID, which is fine when comparing (diffing) notebooks under version control.

My question is not whether it is technically possible on the doconce side, but whether it could lead to downstream problems...

westurner commented 3 years ago

https://github.com/jupyter/nbformat/issues/209
- https://github.com/jupyterlab/jupyterlab/issues/9645#issuecomment-813705163
- https://github.com/jupyterlab/jupyterlab/pull/10018
  - https://github.com/jupyter/nbformat/pull/217
  - https://github.com/jupyter/nbformat/issues/218
    - https://github.com/jupyter/nbformat/blame/master/nbformat/corpus/words.py
    - ```
    def generate_corpus_id():
      return uuid.uuid4().hex[:8]
```
  - AFAIU, there is no further schema restriction on the cell.id field? i.e. nothing will at runtime restrict the value assigned to or located in the cell IDs in an nbformat .ipynb json document?

vidartf commented 2 years ago

If two cells have identical content, you proposal would lead them to have identical IDs, which would not be allowed (each cell's ID need to be unique within the document).

lexnederbragt commented 2 years ago

This is correct. My current implementation solved that by adding a running number cells with identical IDs. See https://github.com/doconce/doconce/pull/223/files#diff-7f024362fe22e3d1f64babebb05a2819ef408b8bdaddf4a0f6527ca492b5856cR753

jupyter / nbformat

Create cell id based on based on its content (source)? #231