jupyter / nbformat

Reference implementation of the Jupyter Notebook format
http://nbformat.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
265 stars 152 forks source link

Create cell id based on based on its content (source)? #231

Closed lexnederbragt closed 2 years ago

lexnederbragt commented 3 years ago

I work a lot with a markup language called DocOnce and its command line conversion tool (https://github.com/doconce/doconce). One of the possible output formats for doconce is Jupyter notebooks.

With the recently added cell ids, each time doconce generates a new version of a notebook from a doconce source file, all cell ids change. This is OK in itself, however, it causes a lot of 'noise' when having the notebook under version control and looking for differences between versions.

Would creating the cell id based on the cell's content be an option for doconce? In practice it would be generating a hash from the text in the (json) source field, rather than a random hash. When generating a new notebook, cells that do not change because the doconce source for it did not change would again get the same id. Cells that changed would get a new ID, which is fine when comparing (diffing) notebooks under version control.

My question is not whether it is technically possible on the doconce side, but whether it could lead to downstream problems...

westurner commented 3 years ago
vidartf commented 2 years ago

If two cells have identical content, you proposal would lead them to have identical IDs, which would not be allowed (each cell's ID need to be unique within the document).

lexnederbragt commented 2 years ago

This is correct. My current implementation solved that by adding a running number cells with identical IDs. See https://github.com/doconce/doconce/pull/223/files#diff-7f024362fe22e3d1f64babebb05a2819ef408b8bdaddf4a0f6527ca492b5856cR753