Missing execution_count causes confusing error messages

trevorcampbell commented 2 years ago

Hi Jupyter team,

See my original post below, but I found the source of the problem. There are actually a few causes:

If a code cell does not have the "execution_count" property in its metadata
if a markdown cell has the "execution count" property
if a markdown cell has the "outputs" property

Trying to load a notebook in the browser or running jupyter nbconvert raises very hard to understand errors for the user.

Here are two notebooks to demonstrate the problem (change .txt to .ipynb). The only difference is the missing execution_count in the metadata.

error.txt

ok.txt

I'm running nbconvert 6.5.0. You can run the nbconvert command in our docker container (see below) to reproduce the issue.

Expected behaviour: It seems that just missing the execution_count/outputs/whatever property could be easily handled by inserting/removing as necessary (with placeholder null value if needed). At a minimum, the error message should be something like "cell X has/does not have property Y", not Notebook JSON is invalid: data.cells[{data__cells_x}] must be valid exactly by one definition (0 matches found), which is very unhelpful.

Original Issue Post

I'm getting an odd error when trying to clear the output of a notebook. For context: this is an assignment notebook submitted by a student in my R datascience class, and students often break notebooks in various ways. Normally I can figure out what the student did and fix the raw ipynb JSON myself, but this error message is so unclear that I really do not know where to start.

Update: After some work I narrowed down the problem to 1 cell. The main.txt notebook below just has that one cell.

Steps to Reproduce: Luckily we do everything in our course in a docker image, so you can reproduce this easily:

download the attached main.txt and change it to main.ipynb
main.txt
navigate to the folder containing the notebook
run docker run --rm -it -v $(pwd):/home/jovyan ubcdsci/r-dsci-100-grading:v0.32.0 /bin/bash
inside the docker container, run jupyter nbconvert --inplace --clear-output main.ipynb

The output then has the error:

Notebook JSON is invalid: data.cells[{data__cells_x}] must be valid exactly by one definition (0 matches found)

Failed validating <unset> in notebook['data']['cells']:

On instance:
<unset>

which tells the user no useful information about how to handle the issue.

By the way, I did run it through jsonlint, which confirmed that the JSON itself is valid.

Thanks!

Nbconvert version: 6.5.0

kcarnold commented 1 year ago

I encountered this problem with nbformat 5.4.0 installed. Upgrading nbformat (to 5.7.3) made the error message much more helpful.

I think the "confusing error message" part of this bug is resolved. However, the validation is quite picky (e.g., not allowing additional properties on cells), probably more picky than needed for most conversion tasks.

m5c commented 1 year ago

Not sure if the OP was also looking for a fix, but here's an unorthodox / efficient solution for markdown cells with outputs property: => run a regex search-and-replace to wipe all empty output properties of the entire document.

import re
with open('MyNotebook.ipynb', 'r') as file:
    # Read original notebook file to string
    jupyter = file.read()

    # Run a regex based search and replace, wipe all empty outputs properties.
    outputs_tag_removed = re.sub(',\n\s+\"outputs\":\ \[\]', '', jupyter)

    # Overwrite original notebook file
    print(outputs_tag_removed, file=open('MyNotebook.ipynb', 'w'))

I first tried to do this with sed but regex matching across line-breaks is not supported by sed, therefore the python script. Hope it helps. I based my solution on this corruption checker. Turned out in my case the errors were always the markdown-cells-with-outputs corruptions.

jupyter / nbconvert

Missing execution_count causes confusing error messages #1872

Original Issue Post