jupyter / nbformat

Reference implementation of the Jupyter Notebook format
http://nbformat.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
259 stars 153 forks source link

Fix malformed schema in v4.5 #382

Open tonyfast opened 10 months ago

tonyfast commented 10 months ago

currently the /defintions/code_cell/properties/metadata/properties/jupyter definition is malformed. source_hidden and outputs_hidden need to within a properties scope.

this initial pull request show the change in v4.5 schema. i'd love some advice about how best fix this in the code base. i'm not sure what schema should be changed or the process to fix a verified schema. do we increment versions?

blink1073 commented 10 months ago

@bollwyvl?

bollwyvl commented 10 months ago

Of note: there are a number of other places where some of these keys appear outside of a properties:

All of these could likely point to a single $ref.

what schema should be changed or the process to fix a verified schema. do we increment versions?

I haven't gone over the versioning scheme (semantic or otherwise) policy here in a while.

I think as this was clearly an oversight, the change could be considered a bugfix, but any time something becomes more strict, it causes a problem.

A cursory glance around GitHub provides a countable number of hits for a non-test source_hidden being persisted as 1, plus a very few "false", and the occasional weird nested {"source_hidden": "1"}.

If we really want to accommodate those 10 examples, perhaps we need something like this, which would ship on the current 5.4:

  "source_hidden": {
    "description": "Whether the source is hidden.",
    "oneOf": [
      {"type": "boolean"},
      {"deprecated": true, "description": "(deprecated) introduced the original notebook format 4.4, this was not properly constrained. In 5.5, non-boolean values will raise a validation error."}
    ]

This should then be automatically fixed, with or without a warning, when parsed into a notebook node.

The whole feature is rather curious. I guess, for source_hidden, but as, according to the changelog on this repo for 4.4:

 Introduce `source_hidden` and `outputs_hidden` as official front-end
  metadata fields to indicate hiding source and outputs areas. **NB**:
  These fields should not be used to hide elements in exported
  formats.

nbconvert has no knowledge of source_hidden, while jupyterlab does. From this (and the use of ===), it looks like only true should be interpreted as truthy, while all other values would be normalized to "remove from the metadata".

On the main, this seems like something we could catch with a metaschema in this repo, which could be much more aggressive with "additionalProperties": false for constraining the top-level and definitions constrained stuff. Doing deeper ones gets... complicated, and only some real tests would reveal what other skeletons lie in the closet.