Open krassowski opened 1 year ago
It's complicated. it should fix most of the issue, the problem being that the way it was written the use of 'if isvalid(notebook)' was returning true and mutating notebook to most of the time be valid.
It should be considered an error to pass an invalid notebook, but currently it is not because other we would break the world.
note that a notebook is not trusted after reload BECAUSE nbformat fixes it, the reason being the signature is compute before any call to normalise/isvalid and therefore the mutated notebook that is saved has a different signature and is thus untrusted.
note that a notebook is not trusted after reload BECAUSE nbformat fixes it, the reason being the signature is compute before any call
Yes that is plausible. However, it could also be that there is another bug in Notebook v7 beta and that is the cause of signature mismatch.
Sorry because I still don't follow: repair_duplicate_cell_ids
is set to True
if it is not provided because it defaults to _deprecated
:
but, when it is used in isvalid()
, repair_duplicate_cell_ids
is force-set to False
hence my line of reasoning:
Am I missing something here?
Also, the method which is used by jupyter-server to save notebooks also includes repair_duplicate_cell_ids=False
:
And isvalid()
call was removed from here in https://github.com/jupyter/nbformat/pull/282
Sure it is possible that there is another bug, and I think that having hard errors on nbformat would make it easier to detect.
In general it would be better to have a nbformat API that is difficult to miss-use.
I'll try to find some time to dive into the code and find the reasons. I don't quite remember all the API.
Is there already a way to make the nbformat API strict ? Or add warning filter during the notebook v7 beta that turn warnings into errors ?
and I think that having hard errors on nbformat would make it easier to detect. In general it would be better to have a nbformat API that is difficult to miss-use.
:+1:
Is there already a way to make the nbformat API strict ? Or add warning filter during the notebook v7 beta that turn warnings into errors ?
jupyter-server has a pre_save_hook
which I was successfully using to debug trust issues. For the next beta of JupyterLab and Notebook one could add a pre-save hook that would validate the notebook JSON and error out if it was faulty.
What do you think?
I think a pre-save hook is fine. usually I think the stricter we are in beta by default the better it is as you can always relax for RC/releases.
The warning "Code cell is missing an id field..." should actually say " Cell is missing an id field.." since it not just 'code' cells but any type of cell. I have changed the offending cell's cell type to 'raw' and it still generates the warning,
I edited validator.py to show the cell index and cell type in the warning message to help track offending cell
if (version, version_minor) >= (4, 5):
# if we support cell ids ensure default ids are provided
for i,cell in enumerate(nbdict["cells"]):
if "id" not in cell:
warnings.warn(
f"Cell index: {i} cell_type {cell.cell_type} {cell.source[0:50]} \n" ".Cell is missing an id field, this will become"
" a hard error in future nbformat versions. You may want"
" to use `normalize()` on your notebooks before validations"
" (available since nbformat 5.1.4). Previous versions of nbformat"
" are fixing this issue transparently, and will stop doing so"
f" in the future. ",
MissingIDFieldWarning,
stacklevel=3,
)
I miss the point in inserting this warning into the normalize()
function.
I'm using normalize()
to get rid of this warning but it's actually not possible due to this behavior.
normalize is a workaround – it was added to have a path forward – in general you should pass a valid notebook into those function and normalize should be no-op.
And it is in the doc of normalize
:
def normalize(...):
"""
...
You should in general not rely on this function and make sure the notebooks
that reach nbformat are already in a normal form. If not you likely have a bug,
and may have security issues.
"""
Normalize will try to fix your notebook, but still warn you if the notebook are invalid.
282 refactored normalisation of missing cell IDs, removing lines:
and replacing them with:
However,
repair_duplicate_cell_ids
is set toFalse
, and in general I do not understand why it is reusingrepair_duplicate_cell_ids
- was it meant to be a new variable for missing cell IDs?Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future
- this sentence is ambiguous; does it mean that current version still fixes this issue transparently or not?