json-schema-org / json-schema-vocabularies

Experimental vocabularies under consideration for standardization
50 stars 9 forks source link

Adding Semantic Annotations to JSON Schema #13

Open danielpeintner opened 7 years ago

danielpeintner commented 7 years ago

Hi,

we (as part of the Web of Things working group) look into the issue how to allow typing values when exchanging data. For now we referenced JSON Schema.

For example an output type of a certain value is defined as follows

"outputData": {"valueType": { "type": "number" }}

In the example above "valueType" is essentially pointing to JSON Schema. One might wonder why { "type": "number" } is nested in "valueType". The reason is that we also have the requirement to semantically annotate a type...

"inputData": { "valueType": { "type": "integer" }, "actuator:unit": "actuator:ms" }

This requirement is the reason why we get in contact with you and also talked about it with @handrews at our last Face-to-Face who seems to be open to extend JSON schema with the possibility to add semantic annotation directly next to a type etc.

A naive and very simple proposal from our side could be to have the possibility to add next to each "type" in JSON schema (a) field(s) for semantic annotations (e.g., something similar exists in SAWSDL by the ‘modelReference’ attribute that allows to make a pointer to a semantic concept).

We believe that other specifications would benefit from semantic annotations also (see for example Swagger OpenAPI Specification).

What do you think? Does it sound reasonable to look into that issue?

Thanks!

handrews commented 7 years ago

@danielpeintner thanks for filing this issue! IIRC your larger document is a JSON-LD document. Is this still true?

My biggest initial confusion is how you see JSON Schema sitting alongside the type features of JSON-LD. This is making it more difficult for me to see how to use semantics and schema together.

What limitations in JSON-LD's type indication features led you to JSON Schema?

Once I understand that, it will be easier to talk about the best way to combine these things. Some possible options include:

danielpeintner commented 7 years ago

Thanks for your reply!

What limitations in JSON-LD's type indication features led you to JSON Schema?

We use JSON-LD to describe the interactions (properties, actions and events) of a given "thing" along with some metadata.

For the case of interactions we also need to describe the structure of the data which expected as input or returned as response. For the case of simple types like strings JSON-LD may be just fine. For the case of composed types (e.g., a JSON object is expected with field "a" typed as int, "b" typed as float with maximum value of 100 and an optional element "c" as String) we we would like to use the expressive power of JSON Schema.

I hope this clarifies out thinking.

handrews commented 7 years ago

@danielpeintner yes, that focus on objects is a great help! I had been thinking too much about the scalar types and have had trouble figuring out how to manage the overlap or why anyone would want to. But I've also done a bit more with RDF and OWL in the meantime, and can see how for more complex structures JSON [Hyper-]Schema would fit this use case better.

vcharpenay commented 7 years ago

Here is an example from robotics. To control the movements of robots, they are often equipped with an Inertial Measurement Unit (IMU), which combines accelerometer and gyroscope to control six degrees of freedom and re-construct the whole movement in 3D. In applications involving robot swarms or industrial robots on a product line that must coordinate, devices might expose IMU data through a single Web resource.

The associated JSON Schema could look like this:

{
  "type": "object",
  "properties": {
    "prop1": { "$def": "#/definitions/def1" },
    "prop2": { "$def": "#/definitions/def1" }
  },
  "definitions": {
    "def1": {
      "type": "array",
      "minItem": "3",
      "maxItem": "3",
      "items": { "type": "number" }
    }
  }
}

Here, the data structure for acceleration and orientation is the same (a vector of 3 numbers). Annotation is required to distinguish between them and figure out e.g. whether prop1 is acceleration or orientation. For now, what we have in mind is a simple annotation like this:

...
  "properties": {
    "prop1": {
      "$def": "#/definitions/def1",
      "modelReference": "http://example.org/vocab#Acceleration"
    },
    "prop2": {
      "$def": "#/definitions/def1",
      "modelReference": "http://example.org/vocab#Orientation"
    }
  },
...
dlax commented 7 years ago

@vcharpenay Couldn't you keep JSON-LD declarations distinct from JSON Schema ones by relying on the JSON-LD context being referenced in an HTTP Link header (see https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld)?

GET http://example.org/imu/1 HTTP/1.1
Accept: application/json

HTTP/1.1 200 OK 
Content-Type: application/json
Link: <http://example.org/schemas/imu.json>; rel="describedby"
Link: <http://example.org/contexts/imu.jsonld>; rel="http://www.w3.org/ns/json-ld#context"
{
  "prop1": [1, 2, 0],
  "prop2": [1.1, 2, 9]
}

And http://example.com/contexts/imu.jsonld being:

{
   "@context":
   {
      "prop1": "http://example.org/vocab#Acceleration",
      "prop2": "http://example.org/vocab#Orientation"
   }
}

This way, one can interpret the JSON instance in both the context of validation (with JSON Schema) and semantics (with JSON-LD) without having to mix validation and semantics in the same description.

Alternatively, we may allow @context keyword in JSON Schema and have it ignored for validation.

handrews commented 7 years ago

@dlax pretty much exactly what I was going to ask. JSON Schema validators SHOULD ignore unrecognized keywords, so the only reason to note this in the spec would be to reserve the keyword for this usage and encourage such integrations.

vcharpenay commented 7 years ago

The Web of Things includes other protocols than HTTP like CoAP or WebSocket. CoAP does not specify a "Link" Option (although one could define in a non-standard way) and WebSocket does not even have the notion of header, I think.

Yet, the JSON-LD context could also be given in the payload, However, in that case, if the data is encoded in binary formats like EXI or CBOR, adding a context URI would significantly increase the size of the message. In case of data streams, it would also introduce unnecessary redundancy.

This is why we believe metadata like JSON-LD mappings should rather be in the Thing Description.

vcharpenay commented 7 years ago

Alternatively, we may allow @context keyword in JSON Schema and have it ignored for validation.

How JSON-LD mappings should be defined is still an open question. What you suggest here might be sufficient indeed. However, as @handrews says, including it explicitly in the spec would encourage its use, especially if it comes with good tooling.

dlax commented 7 years ago

The Web of Things includes other protocols than HTTP like CoAP or WebSocket. CoAP does not specify a "Link" Option (although one could define in a non-standard way) and WebSocket does not event have the notion of header, I think.

Just wanted to mention that, in case the protocol lacks the link notion, you could include an Hyper Schema link in your JSON Schema document:

{
  "type": "object",
  "properties": {
    "prop1": { "$ref": "#/definitions/def1" },
    "prop2": { "$ref": "#/definitions/def1" }
  },
  "definitions": {
    "def1": {
      "type": "array",
      "minItem": "3",
      "maxItem": "3",
      "items": { "type": "number" }
    }
  },
  "links": [
    {
      "rel": "http://www.w3.org/ns/json-ld#context",
      "href": "http://example.com/contexts/imu.jsonld"
    }
  ]
}

Might not be want you want, but I think it readily works.


Alternatively, we may allow @context keyword in JSON Schema and have it ignored for validation.

How JSON-LD mappings should be defined is still an open question. What you suggest here might be sufficient indeed. However, as @handrews says, including it explicitly in the spec would encourage its use, especially if it comes with good tooling.

I also support this. Also, it may be needed to special case @-property to be ignored by validators since (I think) validation would fail on instances with @-properties if additionalProperties is true in the JSON Schema.

sebastiankb commented 7 years ago

Based on this discussion so far, I would recommend to allow both approaches: 1) being self-contained where the semantic tagging is done directly in JSON Schema (see the sample of @vcharpenay above with the modelReference key).
2) make references to a external JSON-LD document which makes the semantic declarations as @dlax pointed out

From the Web of Things perspective the first approaches would make sense when you have Things / Servients that are resource constrained in terms of memory and processing capabilities (e.g., simple temperature sensor). Processing steps such as downloading and doing semantics mapping can be omitted. The second approach makes sense if you have more complex scenarios such as high structured JSON content and/or more powerful Things. What do you think?

dlax commented 7 years ago

Re-reading comments, it seems to me that we have been discussing around the idea of including semantic tagging in JSON Schema documents but I actually think this would better suited in JSON instances since semantic tagging is orthogonal to validation in general.

About JSON-LD, what could be done in JSON Schema specification is to have object member names starting with @ ignored from validation (without requiring "additionalProperties": false). That would mean that the same JSON instance could be readily interpreted in both the context of validation through JSON Schema and semantics through JSON-LD. It seems to me that such JSON instances would be close to the self-contained document @sebastiankb seems to be calling for.

In fact, it's not clear to me if JSON-LD is suitable for your Web of Things applications or not. If it is (as I assumed from @vcharpenay's initial comment), the question is rather how to expose it in an "optimized" way on application side and how to have it play well with JSON Schema validation.

sebastiankb commented 7 years ago

but I actually think this would better suited in JSON instances since semantic tagging is orthogonal to validation in general.

This not really helps us since the Thing Description is a kind of pre-information what a Thing can offers us such as what kind of data is provided and how is it encoded, what kind of functions/actions are served, and what kind of protocol(s) are supported. Especially for the data exchange case we need to understand how the structure looks like (->JSON Schema) and the meaning of the content (e.g., ‘which key in the object represent the temperature value’).

what could be done in JSON Schema specification is to have object member names starting with @ ignored from validation

This sounds good for me. I will discuss in the breakout session which kind of key term we should propose. Here are first ideas:

What we should check is how JSON-LD parser handle the @ thing since there it has already reserved @ keys. Maybe it causes an parsing error if suddenly an unknown occurs. @vcharpenay Can you have a look on that?

handrews commented 7 years ago

@vcharpenay wrote:

The Web of Things includes other protocols than HTTP like CoAP or WebSocket. CoAP does not specify a "Link" Option (although one could define in a non-standard way) and WebSocket does not even have the notion of header, I think.

Hyper-Schema allows defining links without needing to resort to headers. You do need the hyper-schema, of course, but it sidesteps the issue of protocol-dependent linking mechanisms entirely.

handrews commented 7 years ago

@sebastiankb I think your view and @dlax's can be aligned by considering the Thing Description as both a JSON Schema document and an instance document (in general, JSON Schemas are also instances, such as when they are validated by their meta-schemas).

The Thing Description is an instance document that includes both JSON Schema snippets and semantic annotations. If I am understanding everything correctly, the semantics and schema work independently. You are not attaching semantics to the validation schema snippets, you are attaching semantics alongside of but independent of the structural validation. Is this correct?

sebastiankb commented 7 years ago

Mainly, we use the Thing Description to describe Thing’s interaction model which can be one or multiple properties (careful, in Web of Things we using the same term as in JSON Schema, however, it has different meaning), actions, and/or events. Each instance of an interaction defines an input and/or output. For this we would like to embed or to refer to a JSON-Schema definition to declare the payload data which is exchanged at runtime. At this point, we are not able anymore to add semantics since we would like to rely on the pure JSON Schema declarations. This is ok when we have to declare only a single data value type. Typically, the semantics that is provided by the interaction definition is sufficient to understand what this single value is intended to mean (plus the type). Its getting complicated, when the input/output is based on a object / complex type with multiple entries. There it would be great to add semantic annotation.

In yesterday's breakout session in Osaka I gave an introduction and use case about this topic. Maybe it helps to understand why we want to have this extension. Please find here the slides

vcharpenay commented 7 years ago

The Thing Description is an instance document that includes both JSON Schema snippets and semantic annotations. If I am understanding everything correctly, the semantics and schema work independently. You are not attaching semantics to the validation schema snippets, you are attaching semantics alongside of but independent of the structural validation. Is this correct?

@handrews, we do want to attach semantics to the validation schema snippets. The Thing Description should be a document that would allow for both validation and semantic processing of JSON data. The former requires a JSON schema, the latter a JSON-LD context. We could of course provide them separately. But as @dlax put it:

the question is rather how to expose [the JSON-LD context] in an "optimized" way on application side and how to have it play well with JSON Schema validation.

This discussion is indeed about optimizing, in the sense of reducing the amount of information WoT developers should provide. For instance: starting from the "extended" schema I gave about the IMU, @dlax could design an adequate JSON-LD context without further knoweldge about my application. This means machines could do that transformation as well, saving me the time I would have spent on modeling the JSON-LD context.

dlax commented 7 years ago

@sebastiankb I think your view and @dlax's can be aligned by considering the Thing Description as both a JSON Schema document and an instance document

@handrews I've been thinking about this since you suggested it, initially thought it was nice, but am now a bit skeptical. Consider one wants to add JSON-LD @context in a JSON Schema, the only way I could come up with something meaningful is:

{
  "type": "object",
  "properties": {
    "prop1": {
       "@context": "http://example.org/vocab#Acceleration",
       "type": "array",
       "items": {
         "type": "number"
       },
      "minItem": "3",
      "maxItem": "3"
    },
    "prop2": {
       "@context": "http://example.org/vocab#Orientation",
       "type": "array",
       "items": {
         "type": "number"
       },
      "minItem": "3",
      "maxItem": "3"
    }
  }
}

But I can't see how this could be useful because the @contexts are not meant to describe JSON Schema document's members but JSON instance's ones. Also, one has carry the structure of JSON Schema onto the JSON instance to map a context to its member; maybe it's not a big deal in practice but it's a bit awkward (still this coupling between validation and semantics)...

Did you have something different in mind?

(In fact, I now wonder if the proposal of allowing @-members in JSON instances is actually a good idea...)

dlax commented 7 years ago

@vcharpenay Here's another proposal to convey semantics in JSON Schema by making use of Hyper-Schema links:

{
  "type": "object",
  "properties": {
    "prop1": {
      "type": "array",
      "items": {
        "type": "number"
      },
      "minItem": "3",
      "maxItem": "3",
      "links": [
        {
          "rel": "http://www.w3.org/ns/json-ld#context",
          "href": "http://example.org/vocab#Acceleration"
        }
      ]
    }
  }
}

By having a link description object directly attached to a sub-schema (and not to the global schema), it seems pretty close to the modelReference initially suggested. The source resource of each link is a member of the JSON instance (i.e. prop1 in example), so this is readily machine-resolvable. I think this is also pretty self-contained/optimized from the developer point of view.

handrews commented 7 years ago

The Thing Description should be a document that would allow for both validation and semantic processing of JSON data. The former requires a JSON schema, the latter a JSON-LD context. We could of course provide them separately.

@vcharpenay I think we're just using slightly different meanings of "attach" and "separately" here :-) Let me try to come at this a different way:

The only interaction I see here is that you are using JSON Schema (in addition to its usual validation functionality) to determine which parts of the instance should be processed by a given bit of JSON-LD. In your example, you are putting your annotations under specific properties within a JSON Schema "properties" object, indicating which annotations apply to which properties. You could theoretically do the same thing with array elements, or use constructs like "oneOf" to conditionally apply different annotations depending on the instance's run-time structure.

This is also how JSON Hyper-Schema makes use of the JSON Schema validation keywords: http://json-schema.org/latest/json-schema-hypermedia.html#rfc.section.3.1

Is that correct? Are there any other interactions between JSON-LD and JSON Schema that are desired?

handrews commented 7 years ago

@dlax I've tried to sell the WoT folks on JSON Hyper-Schema, but so far they aren't biting :-)

dlax commented 7 years ago

@dlax I've tried to sell the WoT folks on JSON Hyper-Schema, but so far they aren't biting :-)

Hm, okay, didn't know that... On the other hand, from https://w3c.github.io/wot-thing-description/#interaction-patterns, it seems that they're already using hypermedia links in the Thing Description. So it's not obvious why semantics information couldn't be conveyed in the same manner.

vcharpenay commented 7 years ago

We've come to a (temporary) conclusion within our group and will try it in our next PlugFest (2017/07). I summarized it here: w3c/wot-thing-description#5.

We will most likely re-open this issue at a later time and invite you to a joint meeting for a detailed discussion. Is anyone of you, by chance, at the workshop organized by the IRTF T2T group in Prague (WISHI)?

chrispauley commented 6 years ago

+1 To adding JSON-LD vocabulary for semantics. JSON Hyper-Schema vocabulary is for hypermedia linking.

In our integration use cases we need to point to competencies defined in different taxonomies and we need to point to, or include, signed credentials. Semantics in an instance document would enable validating that we have a "correct one of those things."

-Chris Pauley Member of HR Open Standards and part of the Credentialing Ecosystem Mapping Project http://connectingcredentials.org/

surruk51 commented 5 years ago

I have a use case, (simply wanting to pass extra information about how data should be presented on a web page e.g. SELECT vs DATALIST, INPUT TYPE=TEXT vs TEXTAREA etc) and I am trying to see how I can use JSON-Schema to do this in a standards compliant way.

Do I use the JSON-LD context vocabulary to define additional terms, or is it better to create a superset of the JSON-Schema to meet my needs. Or has someone somewhere already buttoned down this particular use case in an accepted way that I have so far failed to find.

handrews commented 5 years ago

@surruk51 purely within JSON Schema (not JSON-LD) this is what the $vocabulary feature in the forthcoming draft is for. It allows declaring what sets of keywords are being used, and with what semantics. Without that set of keywords necessarily needing to be in an RFC (although the ones in RFCs will be referencable as vocabularies as well).

One particular use case is to be able to add a set of keywords to control web UI display. Another is to control code generation (e.g. is this $ref composition or inheritance?).

However, it will be a while before this is implemented at all, and then people will need to define and implement new vocabularies.

For this draft, vocabularies are just URIs associated with a specification (e.g. the vocabulary that will probably have a URI like https://json-schema.org/draft/2019-02/vocab/validation will be defined as "section 6 of the JSON Schema Validation IETF draft specification" (see json-schema-org/json-schema-spec#697 for details of specific vocabs).

The following draft will hopefully define a machine-readable form of this, but that's not going to fit in for now.

In theory, you could make a JSON Schema vocabulary that references JSON-LD and use them together in that way (with regrettable overlap of the term "vocabulary"). But I haven't really sorted out how that would work in practice.

thawkins commented 4 years ago

@surruk51 purely within JSON Schema (not JSON-LD) this is what the $vocabulary feature in the forthcoming draft is for. It allows declaring what sets of keywords are being used, and with what semantics. Without that set of keywords necessarily needing to be in an RFC (although the ones in RFCs will be referencable as vocabularies as well).

One particular use case is to be able to add a set of keywords to control web UI display. Another is to control code generation (e.g. is this $ref composition or inheritance?).

However, it will be a while before this is implemented at all, and then people will need to define and implement new vocabularies.

For this draft, vocabularies are just URIs associated with a specification (e.g. the vocabulary that will probably have a URI like https://json-schema.org/draft/2019-02/vocab/validation will be defined as "section 6 of the JSON Schema Validation IETF draft specification" (see json-schema-org/json-schema-spec#697 for details of specific vocabs).

The following draft will hopefully define a machine-readable form of this, but that's not going to fit in for now.

In theory, you could make a JSON Schema vocabulary that references JSON-LD and use them together in that way (with regrettable overlap of the term "vocabulary"). But I haven't really sorted out how that would work in practice.

Would this mechanism be suitable for annotating schemas with type information for fake data generators? The ability to create a set of test data directly from a schema definition would be very very cool.

I was thinking of faking this (no pun intended), by adding a property syntax to the description keyword, to allow designation of the type information needed by faker (fake data generation library).

handrews commented 4 years ago

@thawkins yes, it should be possible. Data generation and code generation are quite similar. In both cases, the validation constraint system can result in descriptions that are fine for validation but too ambiguous for proper generation (whether that's generation of code, UI, data, etc.) Additional vocabulary keywords can disambiguate those scenarios.

Note that the draft is no longer forthcoming but has in fact been published!

thawkins commented 4 years ago

@thawkins yes, it should be possible. Data generation and code generation are quite similar. In both cases, the validation constraint system can result in descriptions that are fine for validation but too ambiguous for proper generation (whether that's generation of code, UI, data, etc.) Additional vocabulary keywords can disambiguate those scenarios.

Note that the draft is no longer forthcoming but has in fact been published!

This is one i was looking at, but it has added keywords ("faker") which pollutes the schema definition, hence my search for annotations.

https://github.com/json-schema-faker/json-schema-faker

Another possible approach would be to use an escape keyword like "data-*" that would be ignored unless the processor understood the term, this is the same approach that html5 uses to add dynamic attributes to elements without breaking the html5 parser. ie. i could use "data-faker"

BigBlueHat commented 4 years ago

@thawkins as I understand it, the faker keyword would be added via a meta-schema containing a vocabulary definition: https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.D.2.p.1

Something like:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$id": "https://example.com/meta/faker-vocab",
  "$recursiveAnchor": true,
  "$vocabulary": {
    "https://example.com/vocab/faker-vocab": true
  },
  "type": ["object", "boolean"],
  "properties": {
    "minDate": {
      "type": "string"
    }
  }
}

And then JSON Schema's using the faker keyword would need to include a $vocabulary array containing the identifier of that meta-schema:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$id": "https://example.com/json-schema-faker-example",
  "$vocabulary": {
    "https://example.com/vocab/faker-vocab": true
  },
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "faker": "name.findName"
    }
}

Others more in-the-know might have better info, though. 😸

handrews commented 4 years ago

@BigBlueHat that's pretty much correct, with one exception: $vocabulary always goes in the meta-schema. You can, of course, put it in a regular schema, but it's ignored there.

This may seem odd, but think of it this way: Everything in a meta-schema tells you something about the schemas described by the meta-schema. Meta-schemas don't tell you anything about themselves (unless they are their own meta-schema, at which point it gets confusing so let's skip it for now).

$vocabulary says "the thing that this schema describes is using the semantics defined by this JSON Schema vocabulary." Analogous to type saying "the thing that this schema describes conforms to this type, assuming it passes validation."

Since "using a JSON Schema vocabulary" has no meaning for anything other than JSON Schema, putting it in a non-meta-schema doesn't do anything useful. (I suppose someone could define another media type that references JSON Schema vocabularies but let's not borrow trouble).

...anyway... this is why you see $vocabulary in single-vocabulary meta-schemas like https://json-schema.org/draft/2019-09/meta/applicator and also se it re-stated in the general-purpose multi-vocabulary meta-schema https://json-schema.org/draft/2019-09/schema

So @thawkins would want to make a version of the multi-vocabulary meta-schema that adds the faker meta-schema to its allOf, and adds the faker vocabulary to its $vocabulary. You'll probably want the $vocabulary value in the multi-vocabulary meta-schema to be false, indicating that implementations that don't understand it can ignore it.


Now, all of this vocabulary stuff is meant to make it possible to to re-use vocabularies by writing some sort of plugin. If your implementation is the only one that's going to encounter this vocabulary, it's still possible to just use extra keywords without doing any of this. The correct behavior for any implementation is still to ignore unknown keywords. (the true value for $vocabluary gives schema authors a way to tell an implementation to fail if it doesn't recognize the vocabulary, which wasn't ever possible before).

markmelville commented 1 year ago

Including semantics in JSON doesn't HAVE to look like JSON-LD, does it? What if we just add RDF-style keys?

{
  "type": "object",
  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://schema.org/Person",
  "properties": {
    "givenName": {
      "type": "string"
    },
    "dob": {
      "type": "string",
      "http://www.w3.org/1999/02/22-rdf-syntax-ns#datatype": "http://schema.org/birthDate"
    },
}

or with prefixes to make it more readable:

{
  "prefixes": {
    "rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "schema":"http://schema.org/"
  },
  "type": "object",
  "rdf:type": "schema:Person",
  "properties": {
    "givenName": {
      "type": "string"
    },
    "dob": {
      "type": "string",
      "rdf:datatype": "schema:birthDate"
    },
}
rob-metalinkage commented 1 year ago

@markmelville this has some appeal if you added a normative JSON-LD context definition to make it and the rest of the schema resolve as a JSON-LD graph, so this information is accessible. About to post another perspective ..

rob-metalinkage commented 1 year ago

(This is cross-posted to the JSON-LD WG for comment - and self references this issue)

There have been various discussions around linking JSON schema with semantic information via JSON-LD. JSON-LD can be used to semantically annotate JSON data - but there appears to be no way to annotate the schema itself, [1] which limits the potential of OAS to expose useful semantic information about query and response objects.

We have been exploring this in the context of usage of JSON schema in OAS specifications - where the JSON schemas use the available $ref mechanism to create reusable schema building blocks. We have a workable approach to create reusable mappings form these sub-schemas to JSON-LD contexts and hence be able to semantically annotate both schemas as well as instances, but we'd like to double check we havent missed an effort to join these dots.

In a nutshell, and following the spirit of the OAS recommendations to deprecate the use of examples within API specifications and add them as schema annotations…

It appears possible to create a 1:1 mapping of schemas and semantic models using an annotation in the schemas, which allow us to then compose , or re-use predefined, JSON-LD fragments. A composite JSON-LD context can then be created for the top level object like so:


{
  "@context": {
    "properties": {
      "@id": "https://purl.org/geojson/vocab#properties",
      "@context": {
        "hasResult": {
          "@id": "http://www.w3.org/ns/sosa/hasResult",
          "@context": {
            "distance": "http://example.com/vectorObservation/distance",
            "geopose": {
              "@id": "http://example.com/vectorObservation/geopose",
              "@context": {
                "angles": {
                  "@id": "http://example.com/geopose/angles",
                  "@context": {
                    "pitch": "http://example.com/geopose/pitch",
                    "yaw": "http://example.com/geopose/yaw",
                    "roll": "http://example.com/geopose/roll"
                  }
                },...

All that is required is to link the schema fragments with a JSON-LD context like so:

"$schema": https://json-schema.org/draft/2020-12/schema
description: 'GeoPose YPR angles'
'@modelReference': geopose.context.jsonld
type: object
properties:
  yaw:
    type: number
 ....

the context is lightweight and doesnt need to replicate type information if it can be derived from the schema.

@modelReference matches this discussion paper [2] which follows the nomenclature from SAWSDL [3] - However we use the JSON-LD approach of an “@<annotation>” property rather than modifying the JSON schema properties - these could be alternatives perhaps- or we could adopt the modified schema approach if its supported.

Note that neither @context nor $vocabulary match the need to annotate a schema itself, not the instance or the meta-schema (schema description language).

This single mechanism is sufficient to compose a JSON-LD context (like the first example) in the same way OAS can compose a specification from bundled components - and could be built in capability of OAS (or an OAS profile).

Likewise, JSON-LD parsers could potentially lift the context from a scheme reference at run-time.

The approach doesn't predicate tooling support, but we recognise that others may already have equivalent capabilities in the OAS, JSON-schema or JSON-LD spaces, so we reach out for feedback.

There have been various discussions around linking JSON schema with semantic information via JSON-LD. JSON-LD can be used to semantically annotate JSON data - but there appears to be no way to annotate the schema itself, [1] which limits the potential of OAS to expose useful semantic information about query and response objects.

We have been exploring this in the context of usage of JSON schema in OAS specifications - where the JSON schemas use the available $ref mechanism to create reusable schema building blocks. We have a workable approach to create reusable mappings form these sub-schemas to JSON-LD contexts and hence be able to semantically annotate both schemas as well as instances, but we'd like to double check we haven't missed an effort to join these dots.

In a nutshell, and following the spirit of the OAS recommendations to deprecate the use of examples within API specifications and add them as schema annotations…

It appears possible to create a 1:1 mapping of schemas and semantic models using an annotation in the schemas, which allow us to then compose , or re-use predefined, JSON-LD fragments. A composite JSON-LD context can then be created for the top level object like so:


{
  "@context": {
    "properties": {
      "@id": "https://purl.org/geojson/vocab#properties",
      "@context": {
        "hasResult": {
          "@id": "http://www.w3.org/ns/sosa/hasResult",
          "@context": {
            "distance": "http://example.com/vectorObservation/distance",
            "geopose": {
              "@id": "http://example.com/vectorObservation/geopose",
              "@context": {
                "angles": {
                  "@id": "http://example.com/geopose/angles",
                  "@context": {
                    "pitch": "http://example.com/geopose/pitch",
                    "yaw": "http://example.com/geopose/yaw",
                    "roll": "http://example.com/geopose/roll"
                  }
                },...

All that is required is to link the schema fragments with a JSON-LD context like so:

"$schema": https://json-schema.org/draft/2020-12/schema
description: 'GeoPose YPR angles'
'@modelReference': geopose.context.jsonld
type: object
properties:
  yaw:
    type: number
 ....

the context is lightweight and doesnt need to replicate type information if it can be derived from the schema.

@modelReference matches this discussion paper [2] which follows the nomenclature from SAWSDL [3] - However we use the JSON-LD approach of an “@<annotation>” property rather than modifying the JSON schema properties - these could be alternatives perhaps- or we could adopt the modified schema approach if its supported.

Note that neither @context nor $vocabulary match the need to annotate a schema itself, not the instance or the meta-schema (schema description language).

This single mechanism is sufficient to compose a JSON-LD context (like the first example) in the same way OAS can compose a specification from bundled components - and could be built in capability of OAS (or an OAS profile).

Likewise, JSON-LD parsers could potentially lift the context from a scheme reference at run-time.

The approach doesnt predicate tooling support, but we recognise that others may already have equivalent capabilities in the OAS, JSON-schema or JSON-LD spaces, so we reach out for feedback.

[1] Adding Semantic Annotations to JSON Schema · Issue #13 · json-schema-org/json-schema-vocabularies

[2] JSON Schemas with Semantic Annotations Supporting Data Translation

[3] Semantic Annotations for WSDL and XML Schema [2] JSON Schemas with Semantic Annotations Supporting Data Translation

[3] Semantic Annotations for WSDL and XML Schema

tviegut commented 1 year ago

@rob-metalinkage Thanks for this post. The timing of your post is ironic from my end. I'm an expert member of the IEC global standards body for power system standards. As part of a pending Internatiopnal Standard we have a similar requirement. We've historically utilized W3C sawsdl references references within profiles for published international standards for tracability into the canonical model the definitions originate in. With transitioning to JSON schema, etc. this is an areas we've had to put focus on and evaluate approaches. I'll take more time to review the details and see what overlap there may be.

jdesrosiers commented 1 year ago

This topic comes up from time to time. It looks like you came up with the same kind of approach that I usually recommend, which is to introduce a keyword that provides or references a JSON-LD @context object. As far as I know, no one has formalized and shared any particular solution.

I know @ioggstream was working on something like this a while ago. Maybe he has some thoughts to share.

rob-metalinkage commented 1 year ago

See the response here: https://github.com/json-ld/json-ld.org/issues/612#issuecomment-1432067376

ioggstream commented 1 year ago

Hi folks! @jdesrosiers @rob-metalinkage

I "summarized" the problems tied to the automatic generation of data schemas from ontologies in this doc.

@thawkins

"rdf:type": "schema:Person",

the solution proposed in REST API Linked Data Keywords spec uses a similar approach, with a syntax that is keen to OpenAPI 3.0 and code-generation tools that can have issues with : and . in fields

  "type": "string",
 "rdf:datatype": "schema:birthDate"

mixing JSON Schema and XMLSchema can lead to interoperability problems because under the hood they have different data models (e.g. for floating points, ...). This specific case could probably be tackled specifying that a specific JSONSchema string respects specific XMLSchema constraints: (@jdesrosiers hints welcome).

@rob-metalinkage

'@modelReference': geopose.context.jsonld

Referencing json-ld context and type is essentially what REST API Linked Data Keywords does. Instead of @modelReference it uses x-jsonld-context and x-jsonld-type to:

There's a pyscript demo here that generates an RDF out of an annotated JSON Schema. It's very basic, though.

context is lightweight and doesnt need to replicate type information if it can be derived from the schema.

That's correct. The JSON-LD type is a semantic type, since validation information should be only inferred from JSON Schema

Feel free to comment the documents / repos!

rob-metalinkage commented 1 year ago

@ioggstream we have implemented your suggestion to use c-json-ld context and have been working on a complete CI/CT/CD approach to define reusable building blocks for APIs that couple schemas and contexts and support validation of examples, and conformance validators for implementations. Will post links shortly.