jupyter / enhancement-proposals

Enhancement proposals for the Jupyter Ecosystem
https://jupyter.org/enhancement-proposals
BSD 3-Clause "New" or "Revised" License
115 stars 65 forks source link

Add JEP for adding $schema to notebook format #97

Closed filipsch closed 1 year ago

filipsch commented 1 year ago

This JEP proposes to add a new top-level field, $schema to the notebook JSON, as such updating the notebook JSON schema. This new field deprecates nbformat and nbformat_minor.

I skipped the step of creating a GitHub issue and deciding it's a JEP in this repository after discussing with @fcollonval. There was broad consensus about this change and the fact that it's a JEP during the notebook format workshop held in Paris (Feb 28 - Mar 2), and thought it okay to file a PR straight away. I will be the shepherd.

Voting from @jupyter/software-steering-council

willingc commented 1 year ago

@MSeal @rgbkrk Please review. It looks fine to me if a migration path from old to new formats so notebooks from different nbformat versions can be executed.

jasongrout commented 1 year ago

Can we add what the proposed changes are to the schema that removes the deprecated attributes, to have an idea of what the resolution of the deprecated properties looks like?

westurner commented 1 year ago

Please consider YAML-LD (JSON-LD) in naming the attribute $schema

Given that Linked Data is ideal for science publishing and the internet, as explained and justified by https://5stardata.info/

Eventually, I and I believe also @bollwyvl TODO argue that nbformat should have a JSON-LD Context which would make .ipynb transformable to RDF; in order to botj

Eventually,

That's out of scope for this issue, but FEIW the YAML-LD Convenience Context does map a bunch of things that start with $ to their @ equivalents in JSON-LD and $schema may or may not be confusing when working with nbformat as YAML-LD:

{
  "@context": {
    "$base": "@base",
    "$container": "@container",
    "$direction": "@direction",
    "$graph": "@graph",
    "$id": "@id",
    "$import": "@import",
    "$included": "@included",
    "$index": "@index",
    "$json": "@json",
    "$language": "@language",
    "$list": "@list",
    "$nest": "@nest",
    "$none": "@none",
    "$prefix": "@prefix",
    "$propagate": "@propagate",
    "$protected": "@protected",
    "$reverse": "@reverse",
    "$set": "@set",
    "$type": "@type",
    "$value": "@value",
    "$version": "@version",
    "$vocab": "@vocab"
  }
}

$schema is not on the list.

agoose77 commented 1 year ago

@westurner the $schema top-level property is already prior-art for declaring that a document conforms to a JSON Schema. If we use a different property here, we lose the ability to have a large class of validators understand our document. We've touched on RDF/JSON-LD in our weekly meetings, which you're encouraged to join!

FWIW, as I understand it, JSON-LD and JSON-Schema are orthogonaly concepts. In this JEP, we're concerned about the validation side of things; down the road, the linked-document properties of LD will be useful.

tonyfast commented 1 year ago

currently, the top level notebook schema does not allow for any additionalProperties defined in the container, so we can't have any LD @context. we're hoping to introduce @context as a top level key in future schema. there are likely a few proposals between this JEP and an @context proposal. so this is on folks minds, but we decided to defer linked data proposals until some prior JEPs are accepted. advancing the schema will be mean good things for our ability to write linked data contexts.

tonyfast commented 1 year ago

@jupyter/software-steering-council we are working on a draft to present to y'all for the JEP. yesterday we were wondering what to expect with the process. is there any way someone can outline what the process will look like so we can plan our work accordingly and set some deadlines?

rgbkrk commented 1 year ago

This is so much more sensible than the incrementing numbers and awkward compatibility between notebook formats. I'm wholly on board. Thank you all so much for pushing forward with this approach.

fcollonval commented 1 year ago

Thanks all for the great discussion.

is there any way someone can outline what the process will look like so we can plan our work accordingly and set some deadlines?

For my reading there are three opened questions:

And I'm unclear about the following comment of @agoose77 :

We will also need to introduce support for this new keyword in the existing schemas

Which new keyword are we speaking about?


To get validation (from the SSC), the easiest would be to resolve all pending questions and then ping the SSC that this is ready for approval. If some questions are left opened, I would recommend summarizing them in a comment with the possible solutions. Then ping the SSC that will have to figure out how to move forward.

agoose77 commented 1 year ago

What should be the JSON Schema Draft version?

At least 2019-09. I'd be curious to know whether there are downsides to just jumping straight to 2020-12. See https://github.com/filipsch/enhancement-proposals/pull/2 :)

Should we annotate the nbformat_minor and nbformat as deprecated

Yes, I think so.

Should we switch more enum with single value to const as the newer draft allows that (this will ease the understanding).

Yes, I think so.

We will also need to introduce support for this new keyword in the existing schemas

Which new keyword are we speaking about?

Actually, this is something I wanted to follow up with @jasongrout on. Due to the fact that we have additionalProperties: false, no document with the $schema top-level property will be considered valid for existing schemas. Right now, this doesn't cause a hard-failure with nbformat; the validator complains about a validation error, but ultimately loads notebooks with additional properties.

My understanding of our deprecation process is that we will update nbformat so that it always uses $schema if it finds it. The deprecation period simply means that a v4 notebook might have $schema, or it might not. We should keep the nbformat properties in these transition notebooks so that out-of-date nbformat libraries / other validators have a chance at being able to read the notebook if they're permissive enough. i.e., if notebook consumers are not strictly rejecting the document outright due to the new $schema property, then they will have sufficient information to know that it's nbformat 4.

I was originally thinking that we would need to backport $schema to older (<v4.7) schemas, but actually I don't think that's the case.

Going forward, we will in-principle be moving away from a need for major epochs of a schema; we can version the schema by calver (like JSON Schema drafts) if we want to (and without further context, I'd prefer that). To my mind, if we need to be able to upgrade/downgrade notebooks between schema versions, we can do this on a calver-like ordering, i.e. change the API of nbformat.

westurner commented 1 year ago
  • Should we annotate the nbformat_minor and nbformat as deprecated (if we use draft 2019 or later)?

When [W3C SHACL] validation for Linked Data notebooks becomes the norm (because Linked Data Notebook outputs are most practically validated as Linked Data with Shapes and Constraints), then the (URI-namespaced) property for the version of the SHACL validation document would need to supersede $schema, again, So no: $schema URL should not be the nbformat version number because other [SHACL,] schema changes would not result in an implicit change to $schema.

westurner commented 1 year ago

currently, the top level notebook schema does not allow for any additionalProperties defined in the container,

Is JSONschema with additionalProperties: false fundamentally incompatible with JSON-LD?

so we can't have any LD @context. we're hoping to introduce @context as a top level key in future schema.

When the versioned URI and contents of the @context attribute change, does the nbformat major or minor version need to change?

there are likely a few proposals between this JEP and an @context proposal. so this is on folks minds, but we decided to defer linked data proposals until some prior JEPs are accepted. advancing the schema will be mean good things for our ability to write linked data contexts.

nbformat is older than jsonschema, and may outlast jsonschema draft n, so a separate version string that doesn't change between implementations would be great for backward compatibility

tonyfast commented 1 year ago

Is JSONschema with additionalProperties: false fundamentally incompatible with JSON-LD?

notebook documents, the serialized version of someone's notebook, is fundamentally incompatible with JSON-LD. we can add @context and @graph or any other json-ld key into the metadata properties because they have permissive keys. the top level notebook document is much more strict. in fact, without this JEP, $schema is not something that can exist in a serialized notebook document because additionalProperties is false.

When the versioned URI and contents of the @context attribute change, does the nbformat major or minor version need to change?

this is a good consideration, as we deprecate nbformat and nbformat_minor we'll have to increment with each version until the deprecate. we have ongoing discussions about how to handle mismatched $schema and nbformat keys, likely $schema takes precedence.

nbformat is older than jsonschema, and may outlast jsonschema draft n, so a separate version string that doesn't change between implementations would be great for backward compatibility

wow, you're right! nbformat does predate jsonschema, it seems v3 is the first version to rely on draft04. that was a fun dig into history. anyway, current json schema efforts seem to be a well supported community and they are rigorous in their changing their versions. nbformat will update the json schema draft it is based on less than we will update our own schema versions.

we've been spending a lot of time discussing backwards compatibility, and how to handle that best. on going work...

When [W3C SHACL] validation for Linked Data notebooks becomes the norm (because Linked Data Notebook outputs are most practically validated as Linked Data with Shapes and Constraints), then the (URI-namespaced) property for the version of the SHACL validation document would need to supersede

a shacl context for notebook schema will undoubtedly show up in the future. the nbformat schemas serve as valuable interfaces defining linked data contexts. for example, nbformat could be mapped to shacl using a context like:

 {"@vocab": "https://github.com/jupyter/nbformat/blob/main/nbformat/v4/nbformat.v4.5.schema.json#", "@base": "http://www.w3.org/ns/shacl#](http://www.w3.org/ns/shacl#"}
agoose77 commented 1 year ago

we have ongoing discussions about how to handle mismatched $schema and nbformat keys, likely $schema takes precedence.

@tonyfast I was thinking about this after the meeting, and it seems to me that we should literally define these as constants in the schema. My take is that if you author a notebook with $schema, you're literally asking for it to conform to that schema, and that includes nbformat minor and major being valid.

tonyfast commented 1 year ago

moving agenda minutes over from the team compass.

March 7th, 2023

Name Affiliation GitHub Favorite Schema Key
tonyfast @tonyfast properties
fcollonval QuantStack @fcollonval
Angus Hollands Princeton University @agoose77 :smile:
Rowan Curvenote / ExecutableBooks @rowanc1
Nick Bollweg Georgia Tech @bollwyvl

Agenda

first meeting of the notebook cells schema group outside of the nbformat workshop.

to do

$vocabulary

https://gregsdennis.github.io/Manatee.Json/usage/schema/vocabs.html

"$vocabulary": {
    "https://json-schema.org/draft/2019-WIP/vocab/core": true,              // 2
    "https://json-schema.org/draft/2019-WIP/vocab/applicator": true,
    "https://json-schema.org/draft/2019-WIP/vocab/validation": true,
    "https://json-schema.org/draft/2019-WIP/vocab/meta-data": true,
    "https://json-schema.org/draft/2019-WIP/vocab/format": true,
    "https://json-schema.org/draft/2019-WIP/vocab/content": true,
    "https://myserver.net/my-vocab": true
  },

Challenges

flowchart
    mimetypes --> IANA
    multiple_schema[multiple schema]
    validation --> validation_report[validation report]
    JEP --> end_meeting[end this meeting]

Reference

JEP Drafts

March 14th, 2023

Name Affiliation GitHub
tonyfast @tonyfast
Steve Purves Curvenote @stevejpurves
Jason Grout Databricks @jasongrout
Angus Hollands Princeton University @agoose77 :smile:
Nick Bollweg GTech @bollwyvl

Agenda

March 21

no meeting

tonyfast commented 1 year ago

here are the notes from last week. see y'all tomorrow. please add anything you might like to talk about to the agenda.

March 28th, 2023

Name Affiliation GitHub
tonyfast @tonyfast
Nick Bollweg GTech @bollwyvl
Steve Purves Curvenote @stevejpurves
Afshin T. Darian QuantStack @afshin

Agenda

tonyfast commented 1 year ago

attaching notes from last week's meeting. see folks tomorrow.

April 4rd, 2023

Name Affiliation GitHub
tonyfast @tonyfast
jeremy ravenal naas @jravenel
Angus Hollands Princeton University @agoose77
Afshin T. Darian QuantStack @afshin

Agenda

tonyfast commented 1 year ago

hey folks. i likely will miss the meeting today. hopefully someone else can drive the ship. the hackmd is all set up https://hackmd.io/@tonyfast/H1Xnx1B12

tonyfast commented 1 year ago

April 25th, 2023

Name Affiliation GitHub
tonyfast @tonyfast
Nick Bollweg GTech @bollwyvl

Agenda

tonyfast commented 1 year ago

May 2nd, 2023

Name Affiliation GitHub
tonyfast @tonyfast
Angus Hollands Princeton University @agoose77

Agenda

is there someone with the proper rights to label these JEPs?

fcollonval commented 1 year ago

@/all (but especially @jupyter/software-steering-council) in 0871ad1 I updated the schema URI to align with JEP #108; i.e. from https://jupyter.org/schema/notebook/notebook-{nbformat}.{nbformat_minor}.schema.json to https://schema.jupyter.org/notebook/v{nbformat}.{nbformat_minor}/notebook.json

For this particular URI I did not use a subproject (as allowed by the JEP). Let me know if it needs further changes.

fcollonval commented 1 year ago

The vote is now closed with the results:

In favor: 8 Against: 0 Abstention: 0 No vote: 3

--> In light of those results, this JEP is accepted.