meld JSON Schema and JSON-LD context/frame

VladimirAlexiev commented 1 year ago

As an information architect. I want a harmonized way of specifying validation (JSON Schema) and semantic binding (JSON-LD context & frames). So that I can reap both benefits for my JSON and YAML data.

These are very complementary:

JSON Schema specifies the shape of JSON data for validation
JSON-LD specifies the binding of JSON data to semantics, and how to convert RDF<->JSON

What's the relation to YAML:

JSON is trivially convertable to YAML
JSON Schema can be used to validate YAML, see https://www.npmjs.com/package/pajv, https://github.com/json-schema-everywhere/pajv, https://json-schema-everywhere.github.io/
Many people write their JSON schemas in YAML, eg OAS 3

This is a sub-UCR of #19, which itself:

considers a wider context
doesn't have a specific goal yet, i.e. is just informational
considers simple data modeling languages based on YAML, wherein JSON Schema is derived but is not the source

"JSON Schema plus JSON-LD" is an especially relevant case for our community, thus this UCR

@ioggstream "JSON-LD and JSON Schema... I travel these boundaries quite often":
Wouldn't it be nice to "construct a smooth path" so you don't need to cross any boundaries, and can think more about your data model rather than the various modeling mechanisms?

Prior art

(from https://github.com/json-ld/yaml-ld/issues/2):

1: @OR13 often use OAS (Open API Specification) / YAML with JSON-LD and JSON Schema. I like the idea of controlling both semantics and data shape at the same time, using only 1 file. OAS supports JSON Schema represented in YAML. We tweaked the JSON Schema to support JSON-LD terms ($linkedData), so now we can present RDF types and JSON Schema types in a single YAML file. This helps us keep semantics and security in sync (more discussion in https://github.com/json-ld/yaml-ld/issues/2#issuecomment-1137629452). For example:

$linkedData:
  term: AgActivity
  '@id': https://w3id.org/traceability#AgActivity
title: Agricultural Activity

2: @ioggstream added new keywords (x-jsonld-context, x-jsonld-type) to be compatible with OAS 3.0.

Modified Swagger editor that also does semantic mapping: https://ioggstream.github.io/swagger-editor/
spec REST API Linked Data Keywords (23 June 2022), source
whitepaper Add semantic context to APIs / Schemas
presentation Self-explaining APIs: A machine-readable, semantic approach to schema design
(to be) used by the extensive Italian network of ontologies and controlled vocabularies for the Public Administrations (OntoPiA): https://github.com/italia/daf-ontologie-vocabolari-controllati

Considerations

Modularization

TODO

Potential Conflicts

(from https://github.com/json-ld/yaml-ld/issues/51):

JSON Schema includes the following $ keywords: $schema, $vocabulary, $defs, $ref, $id, $anchor, $comment, $dynamicRef, $dynamicAnchor

If we decide to use the same sigil for both kind of keywords, we should look out for conflicts

@id is a conflict with $id
@vocab is a near-conflict with $vocabulary (i.e. could be confusing)

But maybe there is no problem if these keywords are localized to the Context vs Schema parts?

After all, @id is already "overloaded" in JSON-LD:

"@container": "@id" # Node Identifier Indexing
"@id": "bart"             # Node identifier
"@id": {"@id": "bart", "age": 42}  # triple, for which RDF-star annotations will follow

anatoly-scherbakov commented 1 year ago

If we still choose the $ keyword for convenience context we might add a note about potential overlap with JSON Schema. It is possible that for use cases where JSON Schema is in play the users would want to choose something entirely different for their keywords.

💖 for @id and 🥑 for @type, for instance :) This is Unicode, users aren't even limited by ASCII.

ioggstream commented 1 year ago

Wouldn't it be nice to "construct a smooth path" so you don't need to cross any boundaries, and can think more about your data model rather than the various modeling mechanisms?

I tried extensively to achieve a "Theory of everything" but the point is that RDF is not designed to describe syntaxes, and JSON Schema is not designed to define semantics. In the API world, you define strict validation syntaxes for security reasons: this is not always the same thing you have with the generic rdfs:Class / rdfs:range ).

Moreover when creating e.g cross-border services between different countries, you may use the same rdfs:Class with different datatypes / syntaxes. Since this is the actual reality of deployed services, the only interoperable way I found is to address syntaxt and semantics in isolation but in co-operation.

ioggstream commented 1 year ago

In general, I think that this topic goes beyond the goal of yaml-ld and should probably addressed in JSON Schema and JSON-LD as a separate project (E.g. json-ld/restapi-ld-keywords).

OR13 commented 1 year ago

A single file that can define both security constraints and semantics is useful.

However, it may very well be outside the capabilities or interest of this group...

This is the reason I am engaged here.

I'm interested in OAS in YAML with LD annotations.

We have a solution, but it could be better.

I'm not sure YAML-LD is really trying to solve the same problem, it seems more focused on RDF and less on API security.

VladimirAlexiev commented 1 year ago

@ioggstream so you sound quite pessimistic?

The problem with saying "this is out of scope" is that there's significant overlap between

Schema to define the shape of data
Context to define the mapping to URLs and datatypes
Ontology to define the meaning of data (classes with their definitions, props with their definitions and domain/range)

If we don't address this UCR, how do you ensure that eg a prop URL is not misspelled between Schema, Context and Ontology?

nissimsan commented 1 year ago

Moreover when creating e.g cross-border services between different countries, you may use the same rdfs:Class with different datatypes / syntaxes. Since this is the actual reality of deployed services, the only interoperable way I found is to address syntaxt and semantics in isolation but in co-operation.

This seems like a counter argument. Overcoming regional differences is why you would include concise semantics in your syntax. Different jurisdictions will always have different requirements, use cases and APIs (not to mention languages). But they can align on the underlying semantics.

ioggstream commented 1 year ago

@VladimirAlexiev This presentation I hope to present at the next APISpec explains my view on JSON-LD vs JSON Schema https://docs.google.com/presentation/d/175ZFBXkhaawtvD97lU7II9-G2R6_a6XAMo82_BaYr4c/edit#slide=id.g14b35179850_0_8

It's not a trivial explanation, though.

I think a solution should work for both JSON-LD and YAML-LD so I won't address it here.

cc: @OR13

gkellogg commented 1 year ago

Discussed at TPAC F2F.

Generally consensus that this is not really a good idea, in spite of the fact that it keeps coming up. In any case, the issue is bigger than just YAML-LD and JSON Schema.

Gregg Kellogg: This involved melding JSON schema with contexts

... There has been some desire to use context as a schema [paraphrase]

Manu Sporny: Every time we've discussed this, we've decided to keep them separate

... We don't want to commit to a schema language. There are different schema languages for different serialisations

Manu Sporny: In the JSON-LD world, even though schema and context play nicely together, I worry about other shape languages and if we commit to using schema for context, we might get into trouble

... So I don't see an argument for mergig the two

Ivan Herman: I may be out of touch by now but the problem I have with JSON schema is that it's not stable. We offered those folks a W3C WG if they're ready.

... We were turned down as they were on version 7, and from one version to another they may not keep interop

... It's not a stable partner so we can't use it normatively

Orie Steel: There is also OAS, which builds on JSON Schema and defines APIs.... very useful.

Benjamin Young: I would echo the same things. It seems to be a recurring problem with YAML-LD. All these weird appendages that seems to scrape in other issues

Orie Steel: +1 To JSON Schema / OAS / YAML stuff being its own spec, not directly related to YAML-LD

... It might be cool, but it would be its own spec and not part f YAML

Orie Steel: We'll keep using JSON Schema as it is... its working for us.

Benjamin Young: There would be high hills to climb through JSON-schema's lack of stability (not for lack of interest)

Benjamin Young: It's not YAML-LD related as such

Orie Steel: +1 To embedding LD in OAS not being YAML-LD related.

Gregg Kellogg: I'd say that we should close this as being out of scope

... schemas are closely related to JSON-LD. If there were a subgroup that wanted to pursue this, OK, we can create the repos

Ivan Herman: I'd have the more fundamental proposition. It comes back to the misconception - when people look at an RDF vocab, they see it as a constraint. It's not, it's a licence to infer

... A context is a mapping from JSOn to the RDF

Orie Steel: I'm happy to continue to merge JSON Schema + JSON-LD in CG work :) ... as I said, its been working fine for what we are doing.

... That's very different than a schema language that constrains what is and isn't valid.

Manu Sporny: +1 To what Ivan is saying

Ivan Herman: So even if schema were stable, it's not the right choice as it mixes up vocabs and context. Not the same

Timothée Haudebourg: +1 Totally agree with that

Pierre-Antoine Champin: Big +1 that this has noting to do with YAML-LD

... but I must disagree with what Ivan said. There is indeed a misconception of what RDFS etc. do

... this misconception has been reduced now that we have SHACL/ShEx

... we do agree that JSON-LD contexts are not vocabs

Orie Steel: +1 To comfort defining vocabs though schemas... thats what we do.

Orie Steel: But thats not what YAML-LD is about...

Mike Prorock: +1 Orie

Orie Steel: It might be what OAS-LD would be about...

Pierre-Antoine Champin: It so happens that people are more comfortable sharing a voc as a constraint. I don't thnik there's a probem to map ...

Ivan Herman: That's not what I said. Is it possible to combine a JSON schema with SHACL? I'm not sure. Trying to combine those is likely to be difficult.

Ivan Herman: They sort of do the same thing - setting up constraints. But that's not the same as a JSON-LD context

Pierre-Antoine Champin: I agree with that

Pierre-Antoine Champin: The value I see in trying to bring together JSON schema and JSON-LD context - JSON schemas are used to define shared vocabs.

Pierre-Antoine Champin: On the WEb, those terms can have an IRI. They may not be considered an RDF voc, there's low hanging fruit there

Pierre-Antoine Champin: It could make it easier for people to adopt JSON-LD

Mike Prorock: +1

Manu Sporny: I want to agree with PA - this keeps coming up. The traceability folks have created the ability to take a schema and use that as a context

Orie Steel: Example: https://w3c-ccg.github.io/traceability-vocab/openapi/components/schemas/credentials/BillOfLadingCertificate.yml ... --> ... https://github.com/w3c-ccg/traceability-vocab/blob/gh-pages/contexts/traceability-v1.jsonld#L250

... I think we're going to keep seeing it

Manu Sporny: We are seeing a pattern where people want to create a shape along with a JSON-LD context

Manu Sporny: It's a large discussion than just YAML-LD. It's an ecosystem discussion

Orie Steel: And for humans: https://w3id.org/traceability#BillOfLadingCertificate

Manu Sporny: This is for the broader LC community

Mike Prorock: I think it is a broader ecosystem need. VCs using JSON-LD makes sense, but there's a need to constrain what the VC can contain

Phil Archer is scribing.

Phil Archer: One other thing that might come up in RCH is canonicalization against a shape

... take some data and some shape, and output a canonical form that excludes anything that's not in the shape

Gregg Kellogg: This sounds like framing

Gregg Kellogg: Maybe there's something adjacent to framing that can be used

... Might be useful for output formatting

... How do I order the properties of an object etc.

Gregg Kellogg: JSON objects are unordered

... Lots of discussions about this

Gregg Kellogg: I don't think there's any action we can take on this issue, except to point back to this conversation for future reference

Orie Steel: I feel like we have confirmed that "JSON Schema" and "OAS" are not related to YAML-LD.

Pierre-Antoine Champin: There's a comment from Orie that this is not a YAML issue specifically.

json-ld / yaml-ld