Closed lexnederbragt closed 2 years ago
def generate_corpus_id():
return uuid.uuid4().hex[:8]
If two cells have identical content, you proposal would lead them to have identical IDs, which would not be allowed (each cell's ID need to be unique within the document).
This is correct. My current implementation solved that by adding a running number cells with identical IDs. See https://github.com/doconce/doconce/pull/223/files#diff-7f024362fe22e3d1f64babebb05a2819ef408b8bdaddf4a0f6527ca492b5856cR753
I work a lot with a markup language called DocOnce and its command line conversion tool (https://github.com/doconce/doconce). One of the possible output formats for doconce is Jupyter notebooks.
With the recently added cell ids, each time doconce generates a new version of a notebook from a doconce source file, all cell ids change. This is OK in itself, however, it causes a lot of 'noise' when having the notebook under version control and looking for differences between versions.
Would creating the cell id based on the cell's content be an option for doconce? In practice it would be generating a hash from the text in the (json)
source
field, rather than a random hash. When generating a new notebook, cells that do not change because the doconce source for it did not change would again get the same id. Cells that changed would get a new ID, which is fine when comparing (diffing) notebooks under version control.My question is not whether it is technically possible on the doconce side, but whether it could lead to downstream problems...