jupyter / nbformat

Reference implementation of the Jupyter Notebook format
http://nbformat.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
269 stars 152 forks source link

Ambiguous warning about missing cell IDs #359

Open krassowski opened 1 year ago

krassowski commented 1 year ago

282 refactored normalisation of missing cell IDs, removing lines:

notebook_supports_cell_ids = ref is None and version >= 4 and version_minor >= 5
if notebook_supports_cell_ids and repair_duplicate_cell_ids:
    # Auto-generate cell ids for cells that are missing them.
    for cell in nbdict["cells"]:
        if "id" not in cell:
            # Generate cell ids if any are missing
            cell["id"] = generate_corpus_id()

and replacing them with:

if (version, version_minor) >= (4, 5):
    # if we support cell ids ensure default ids are provided
    for cell in nbdict["cells"]:
        if "id" not in cell:
            warnings.warn(
                "Code cell is missing an id field, this will become"
                " a hard error in future nbformat versions. You may want"
                " to use `normalize()` on your notebooks before validations"
                " (available since nbformat 5.1.4). Previous versions of nbformat"
                " are fixing this issue transparently, and will stop doing so"
                " in the future.",
                MissingIDFieldWarning,
                stacklevel=3,
            )
            # Generate cell ids if any are missing
            if repair_duplicate_cell_ids:
                cell["id"] = generate_corpus_id()
                changes += 1

However,

Carreau commented 1 year ago

It's complicated. it should fix most of the issue, the problem being that the way it was written the use of 'if isvalid(notebook)' was returning true and mutating notebook to most of the time be valid.

It should be considered an error to pass an invalid notebook, but currently it is not because other we would break the world.

note that a notebook is not trusted after reload BECAUSE nbformat fixes it, the reason being the signature is compute before any call to normalise/isvalid and therefore the mutated notebook that is saved has a different signature and is thus untrusted.

krassowski commented 1 year ago

note that a notebook is not trusted after reload BECAUSE nbformat fixes it, the reason being the signature is compute before any call

Yes that is plausible. However, it could also be that there is another bug in Notebook v7 beta and that is the cause of signature mismatch.

Sorry because I still don't follow: repair_duplicate_cell_ids is set to True if it is not provided because it defaults to _deprecated:

https://github.com/jupyter/nbformat/blob/4692b54b283fd5e0c068116f1c02fe5ba8b9b034/nbformat/validator.py#L401-L408

https://github.com/jupyter/nbformat/blob/4692b54b283fd5e0c068116f1c02fe5ba8b9b034/nbformat/validator.py#L456-L457

but, when it is used in isvalid(), repair_duplicate_cell_ids is force-set to False hence my line of reasoning:

https://github.com/jupyter/nbformat/blob/4692b54b283fd5e0c068116f1c02fe5ba8b9b034/nbformat/validator.py#L107-L121

Am I missing something here?

krassowski commented 1 year ago

Also, the method which is used by jupyter-server to save notebooks also includes repair_duplicate_cell_ids=False:

https://github.com/jupyter/nbformat/blob/4692b54b283fd5e0c068116f1c02fe5ba8b9b034/nbformat/current.py#L182-L206

And isvalid() call was removed from here in https://github.com/jupyter/nbformat/pull/282

Carreau commented 1 year ago

Sure it is possible that there is another bug, and I think that having hard errors on nbformat would make it easier to detect.

In general it would be better to have a nbformat API that is difficult to miss-use.

I'll try to find some time to dive into the code and find the reasons. I don't quite remember all the API.

Is there already a way to make the nbformat API strict ? Or add warning filter during the notebook v7 beta that turn warnings into errors ?

krassowski commented 1 year ago

and I think that having hard errors on nbformat would make it easier to detect. In general it would be better to have a nbformat API that is difficult to miss-use.

:+1:

Is there already a way to make the nbformat API strict ? Or add warning filter during the notebook v7 beta that turn warnings into errors ?

jupyter-server has a pre_save_hook which I was successfully using to debug trust issues. For the next beta of JupyterLab and Notebook one could add a pre-save hook that would validate the notebook JSON and error out if it was faulty.

What do you think?

Carreau commented 1 year ago

I think a pre-save hook is fine. usually I think the stricter we are in beta by default the better it is as you can always relax for RC/releases.

jj-github-jj commented 1 year ago

The warning "Code cell is missing an id field..." should actually say " Cell is missing an id field.." since it not just 'code' cells but any type of cell. I have changed the offending cell's cell type to 'raw' and it still generates the warning,

I edited validator.py to show the cell index and cell type in the warning message to help track offending cell

if (version, version_minor) >= (4, 5):
    # if we support cell ids ensure default ids are provided
    for i,cell in enumerate(nbdict["cells"]):
        if "id" not in cell:
            warnings.warn(
                f"Cell index: {i} cell_type {cell.cell_type} {cell.source[0:50]} \n" ".Cell is missing an id field, this will become"
                " a hard error in future nbformat versions. You may want"
                " to use `normalize()` on your notebooks before validations"
                " (available since nbformat 5.1.4). Previous versions of nbformat"
                " are fixing this issue transparently, and will stop doing so"
                f" in the future. ",
                MissingIDFieldWarning,
                stacklevel=3,
            )
Grufoony commented 1 year ago

I miss the point in inserting this warning into the normalize() function. I'm using normalize() to get rid of this warning but it's actually not possible due to this behavior.

Carreau commented 1 year ago

normalize is a workaround – it was added to have a path forward – in general you should pass a valid notebook into those function and normalize should be no-op.

Carreau commented 1 year ago

And it is in the doc of normalize:

def normalize(...):
    """
...

    You should in general not rely on this function and make sure the notebooks
    that reach nbformat are already in a normal form. If not you likely have a bug,
    and may have security issues.
    """

Normalize will try to fix your notebook, but still warn you if the notebook are invalid.