json-ld / json-ld.org

JSON for Linked Data's documentation and playground site
https://json-ld.org/
Other
852 stars 151 forks source link

Questions about the JSON-LD JSON Schema #612

Open handrews opened 6 years ago

handrews commented 6 years ago

Hi folks, Some documents, such as the W3C Web of Things "Thing Description", use JSON-LD and JSON Schema in the same document.

Over at the JSON Schema project, we are working on a concept of Schema Vocabularies (I know, too many "vocabularies", sorry...) in order to facilitate this kind of usage. The main issue proposing vocabularies is json-schema-org/json-schema-spec#561.

In this view, JSON Schema is not just validation, but also Hyper-Schema, meta-data annotation, code generation hints, UI generation directives, and other things. In fact, most vocabularies will not directly affect validation (e.g. Hyper-Schema adds links, which never cause validation to fail).

Given that there is a JSON Schema for JSON-LD, I want to see if we can treat that schema as describing JSON-LD as a JSON Schema vocabulary. (@gkellogg this is where I've ended up based on the few discussions we had at the W3C WoT conference in Santa Clara about a year ago).

Before diving in, I have a few questions if anyone would be willing to help me understand how this schema works:

{
    "title": "Schema for JSON-LD",
    "$schema": "http://json-schema.org/draft-04/schema#",
    "definitions": {...},
    "allOf": [
        { "$ref": "#/definitions/context" },
        { "$ref": "#/definitions/graph" },
        { "$ref": "#/definitions/common" }
    ],

    "type": ["object", "array"],
    "additionalProperties": true,
    "items": {
        "allOf": [
            { "$ref": "#/definitions/context" },
            { "$ref": "#/definitions/graph" },
            { "$ref": "#/definitions/common" }
        ]
    }
}

Thanks for any help you can offer!

handrews commented 6 years ago

I'm starting to write PRs for json-schema-org/json-schema-spec#561 (Basic vocabulary support), so if there is any feedback from the JSON-LD community on this concept, now would be a great time to comment :-)

I'm still just writing up the basics, but if anyone can reply to the questions I had above that would help me as I move on to specific uses as they relate to JSON-LD. I just want to make sure I'm reading things correctly and starting from an accurate description of JSON-LD when considering how to integrate it.

akuckartz commented 5 years ago

I have not yet looked at the questions in detail. But I suppose that answers for the yet unfinished JSON-LD 1.1 would be as helpful as for 1.0. Correct?

millercl commented 5 years ago

JSON-LD validation is possible using existing tools: jsonlint, jsonld, shalc. I didn't even attempt feasibility with JSON Schema because it seemed tree-oriented; which is to assume it would be incapable of interpreting the graph data model. Having jsonld reformat a document (compact, flatten, etc.) will expose whether it is valid JSON-LD or not. Would rather see SHACL-AF implemented in JavaScript.

handrews commented 5 years ago

@akuckartz I believe so

@millercl people using JSON LD have come to the JSON Schema project a number of times in the past and asked about it. JSON Schema is not attempting to solve the same problems as most, possibly all, of those tools you mention. If you need those tools, use them- this question is not directed at your use case.

BigBlueHat commented 5 years ago

@handrews first, thanks for digging into all this topic!

In the Web Annotation Working Group we used JSON Schemas to test the required tree structure for Web Annotation documents. Due to probably the same limitations you're looking to address, we ended up breaking up the schemas into several and used them in succession--because (as far as I recall) wasn't possible to check all the variations from a single schema.

Also, the new JSON-LD Working Group is now in full swing! It'd be great to have your thoughts, questions, ideas, and/or concerns raised on the new mailing list or issues on the new repositories.

Thanks again for exploring this use case!

handrews commented 5 years ago

@BigBlueHat you're welcome! Is there a particular new repository where it would be good to have a new tracking issue for this? At the moment I have my hands full with JSON Schema and closely related projects. I'll respond if you @-mention me, but I can't keep up with an entire project's mailing list or notifications at this time. I also just started a new job, so maybe once I'm a bit more settled in that I'll be able to take on more.

BigBlueHat commented 5 years ago

Understand completely! The primary repos are https://github.com/w3c/json-ld-syntax/ and https://github.com/w3c/json-ld-api/ (for syntax and API respectively). The syntax one is likely the best place to start for specific requests/needs for the shape of the JSON-LD.

I'll try to keep an eye on this space also, so if you have specific needs (but not the time to file the issues), you can @ mention me. Between the two of us, we should be able to get both these specs headed toward each other a bit more. 😁

Thanks for all you do here! 🎩

marcadella commented 4 years ago

@handrews you were saying when you started this issue that

JSON Schema is not just validation, but also Hyper-Schema, meta-data annotation, code generation hints, UI generation directives, and other things

I've had a look at JSON schema 2019-09 and of course there is validation and hyper-schema, but I don't think the spec offers any solution for the last 3 items (meta-data annotation, code generation hints, UI generation directives). Any hope to see those area covered in the spec or should I look somewhere else?

handrews commented 4 years ago

@marcadella 2019-09 adds $vocabulary, a mechanism for defining and requiring 3rd-party re-usable keyword vocabularies. A vocabulary is associated with a URI, so an extensible implementation could support a plugin architecture matching a plugin to a vocabulary URI, and handing keyword processing to that plugin.

That is how we expect to see those other use cases solved. Preferably by someone other than the core JSON Schema spec team as we have our hands full as it is. OpenAPI is very interested in the code generation part in particular, and as soon as we have the next OpenAPI version out that aligns with the latest (probably 2020-06?) JSON Schema draft, the OpenAPI code generation tooling ecosystem will start working on that problem.

handrews commented 4 years ago

But we can't keep expanding the core and validation specs forever, we'll never finish them that way. Everyone always wants keywords for their increasingly specialized tasks. Hyper-Schema is kind of on hold while we focus on Core and Validation for the moment.

marcadella commented 4 years ago

@handrews Thanks for your answer. This plugin system seems very elegant and actually you could almost take the validation out of the core spec and implement is as a separate plugin, keeping the core as only a (powerful) plugin system, couldn't you?

But let's come back to your original issue, two years later is there any plan on integrating JSON-LD as a plugin? Right now it is clumsy to work with both linked data and schema validation as one need to write a JSON-LD tree and a Json schema tree and link them somehow (I personally use the Json schema description annotation to link to the corresponding JSON-LD URI). This process is rather sub-optimal and clumsy. Any work in that direction or an alternative you could point me to?

Update

There is JSON in RDF that kind of embeds JSON-LD into a JSON-schema, but isn't it a bit hackish?

handrews commented 4 years ago

@marcadella

This plugin system seems very elegant and actually you could almost take the validation out of the core spec and implement is as a separate plugin, keeping the core as only a (powerful) plugin system, couldn't you?

Technically, we did! 😁

There are two specification documents: Core and Validation (I'll capitalize them, vocabularies will be lower-case), containing six standard vocabularies. Only one, the "core" vocabulary, is mandatory. It contains the keywords needed to bootstrap schema processing (including figuring out which plugins are in use).

The other standard vocabularies are "applicator" (for applying subschemas- allOf, properties, etc.), "validation" (the things that produce true/false results, type, maxLength, etc.- applicator results are functions of their subschemas, they do not produce their own results), "meta-data" (title, default, readOnly, etc.), "format" (format by itself b/c it's a mess), and "content" (for embedding other media types in JSON strings). Hyper-Schema is a separate spec and also a separate vocabulary.

We only got the $vocabulary concept out in September, and (aside from the proof-of-concept implementation done while we worked on the spec), we're only now seeing implementations that support it start to be ready. So it's barely to the point where anyone can really start building vocabularies and writing extension modules.

As for plans- the JSON Schema organization is just focusing on getting Core and Validation done (including the extensibility mechanism, which needs some refinement and was left open for feedback in some areas rather than trying to nail it all on the first go).

We are inviting other folks to build vocabularies, so hopefully someone who is interested in both JSON-LD and JSON Schema can put together a project for that. We're available through GitHub and Slack (there's an open invite link on json-schema.org) to consult on that, but we can't take it on ourselves. Same with other high-demand vocabularies like code generation, which we expect will be driven by OpenAPI folks who do a lot of code gen tools.

BigBlueHat commented 4 years ago

There is JSON [Schema] in RDF that kind of embeds JSON-LD into a JSON-schema, but isn't it a bit hackish?

Not hackish at all. 😃 It's exactly how you'd come at expressing JSON Schema in a JSON-LD document. However, that's not exactly the same as integrating JSON Schema and JSON-LD (contexts) into a single document format, but it does lay the foundation for doing so. Sections 3.3 Defining a JSON-LD context for data instances and 3.4 Embedding schema definitions in data instances in the current JSON Schema in RDF spec have some good examples of how this integration looks (or could look) once finalized.

FWIW, the JSON-LD Community Group at the W3C is booting back up--while we hand off the JSON-LD WG activity--and it'd be a great place to flesh out the JSON-LD side of things...which we'd certainly want to do with the Web of Things folks (who are already dependent on these in-progress specs).

Would be great to have y'all join us there, and get this issue on an upcoming call--which we hope to begin this month and do bi-monthly (probably).

handrews commented 4 years ago

Thanks, @BigBlueHat . I'm staying fairly heads-down trying to get 2020-06 out (which may include some tweaks to the vocabulary system based on early feedback, and should be the draft to really focus on as it will go out in coordination with the next OpenAPI version, finally aligning the two projects). I'll tag @relequestual and @awwright to see if they might want to work on this coordination. I'm happy to join a call as-needed, though.

ioggstream commented 2 years ago

Hi @BigBlueHat

I was looking at 3.3 Defining a json-ld context... and I don't understand which is the namespace for type, description and all other json-schema keywords. Are they mapped using the context specified in example 1 ?

    @context:
         jsonld: "http://www.w3.org/ns/json-ld#"
    "jsonld:context": "http://schema.org",
    "type": "object",
    "description": "Schema of a commercial product with GTIN and manufacturer",

If I want to define a schema as json-ld, shouldn't I use something like that?

'@context':
  json-schema: https://www.w3.org/2019/wot/json-schema# 
  jsonld: http://www.w3.org/ns/json-ld# 
  jsonld:iri:  { "@type": @id }
'@id': http://sche.ma/Person
'@type': json-schema:ObjectSchema
jsonld:context:
  jsonld:definitions:
  - @type jsonld:TermDefinition
    jsonld:term: family_name
    jsonld:iri: http://schema.org/familyName

json-schema:properties:
- '@id': _:b0
  '@type': json-schema:StringSchema
  json-schema:propertyName: family_name

If we just wanted to tie the json-schema property to the schema.org/familyName wouldn't be enough to state something like

'@context':
  json-schema: https://www.w3.org/2019/wot/json-schema# 
  jsonld: http://www.w3.org/ns/json-ld# 
  jsonld:iri:  { "@type": "@id" }
'@id': http://sche.ma/Person
'@type': json-schema:ObjectSchema

json-schema:properties:
- '@id': _:b0
  '@type': json-schema:StringSchema
  json-schema:propertyName: family_name
  jsonld:iri: http://schema.org/familyName  # Add reference here

and then generate the @context?

rob-metalinkage commented 1 year ago

(this is cross posted on the relevant JSON-schema issue referenced inline - as this discussion seems to have arisen there in 2017 and remains unresolved)

There have been various discussions around linking JSON schema with semantic information via JSON-LD. JSON-LD can be used to semantically annotate JSON data - but there appears to be no way to annotate the schema itself, [1] which limits the potential of OAS to expose useful semantic information about query and response objects.

We have been exploring this in the context of usage of JSON schema in OAS specifications - where the JSON schemas use the available $ref mechanism to create reusable schema building blocks. We have a workable approach to create reusable mappings form these sub-schemas to JSON-LD contexts and hence be able to semantically annotate both schemas as well as instances, but we'd like to double check we haven't missed an effort to join these dots.

In a nutshell, and following the spirit of the OAS recommendations to deprecate the use of examples within API specifications and add them as schema annotations…

It appears possible to create a 1:1 mapping of schemas and semantic models using an annotation in the schemas, which allow us to then compose , or re-use predefined, JSON-LD fragments. A composite JSON-LD context can then be created for the top level object like so:


{
  "@context": {
    "properties": {
      "@id": "https://purl.org/geojson/vocab#properties",
      "@context": {
        "hasResult": {
          "@id": "http://www.w3.org/ns/sosa/hasResult",
          "@context": {
            "distance": "http://example.com/vectorObservation/distance",
            "geopose": {
              "@id": "http://example.com/vectorObservation/geopose",
              "@context": {
                "angles": {
                  "@id": "http://example.com/geopose/angles",
                  "@context": {
                    "pitch": "http://example.com/geopose/pitch",
                    "yaw": "http://example.com/geopose/yaw",
                    "roll": "http://example.com/geopose/roll"
                  }
                },...

All that is required is to link the schema fragments with a JSON-LD context like so:

"$schema": https://json-schema.org/draft/2020-12/schema
description: 'GeoPose YPR angles'
'@modelReference': geopose.context.jsonld
type: object
properties:
  yaw:
    type: number
 ....

the context is lightweight and doesnt need to replicate type information if it can be derived from the schema.

@modelReference matches this discussion paper [2] which follows the nomenclature from SAWSDL [3] - However we use the JSON-LD approach of an “@<annotation>” property rather than modifying the JSON schema properties - these could be alternatives perhaps- or we could adopt the modified schema approach if its supported.

Note that neither @context nor $vocabulary match the need to annotate a schema itself, not the instance or the meta-schema (schema description language).

This single mechanism is sufficient to compose a JSON-LD context (like the first example) in the same way OAS can compose a specification from bundled components - and could be built in capability of OAS (or an OAS profile).

Likewise, JSON-LD parsers could potentially lift the context from a scheme reference at run-time.

The approach doesn't predicate tooling support, but we recognise that others may already have equivalent capabilities in the OAS, JSON-schema or JSON-LD spaces, so we reach out for feedback.

[1] Adding Semantic Annotations to JSON Schema · Issue #13 · json-schema-org/json-schema-vocabularies

[2] JSON Schemas with Semantic Annotations Supporting Data Translation

[3] Semantic Annotations for WSDL and XML Schema

gkellogg commented 1 year ago

If there are changes to JSON-LD that might better facilitate this, consider adding issues to https://github.com/w3c/json-ld-syntax and/or https://github.com/w3c/json-ld-api. There's likely to be a JSON-LD 1.2 in the next couple of years.

handrews commented 1 year ago

@rob-metalinkage Is @modelReference is just a regular JSON Schema annotation keyword defined to be useful to for linking to JSON-LD? I did a quick implementation of this as a JSON Schema vocabulary in the Python jschon package as follows:

Vocabulary meta-schema

The presence of 2020-12 in the $id is just noting that this is intended to work with the vocabulary system in draft 2020-12 (it would also work in 2019-09, though). Note: It would probably make more sense to give the vocab meta-schema and dialect meta-schema current months, but I was adapting this quickly from other things I had lying around.

{
    "title": "JSON Schema 2020-12 Semantic Web Model Reference Vocabulary",
    "$id": "https://example.com/reference/meta/2020-12/modelref",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/core": true, 
        "https://example.com/reference/vocab/2020-12/modelref": true
    },    
    "$dynamicAnchor": "meta",

    "properties": {
        "@modelReference": {
            "type": "string"
        }     
    }
}

Dialect meta-schema

This just adds the above vocabulary to the standard 2020-12 vocabularies:

{
    "title": "JSON Schema 2020-12 extended with @modelRef annotations",
    "$id": "https://example.com/reference/dialect/2020-12/modelref",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/core": true, 
        "https://json-schema.org/draft/2020-12/vocab/applicator": true,
        "https://json-schema.org/draft/2020-12/vocab/unevaluated": true,
        "https://json-schema.org/draft/2020-12/vocab/validation": true,
        "https://json-schema.org/draft/2020-12/vocab/meta-data": true,
        "https://json-schema.org/draft/2020-12/vocab/format-annotation": true,
        "https://json-schema.org/draft/2020-12/vocab/content": true,
        "https://example.com/reference/vocab/2020-12/modelref": true
    },  
    "$dynamicAnchor": "meta",
    "type": ["object", "boolean"],
    "allOf": [
        {"$ref": "https://json-schema.org/draft/2020-12/meta/core"},
        {"$ref": "https://json-schema.org/draft/2020-12/meta/applicator"},
        {"$ref": "https://json-schema.org/draft/2020-12/meta/unevaluated"},
        {"$ref": "https://json-schema.org/draft/2020-12/meta/validation"},
        {"$ref": "https://json-schema.org/draft/2020-12/meta/meta-data"},
        {"$ref": "https://json-schema.org/draft/2020-12/meta/format-annotation"},
        {"$ref": "https://json-schema.org/draft/2020-12/meta/content"},
        {"$ref": "https://example.com/reference/meta/2020-12/modelref"}
    ]
}

Sample schema using the dialect

{
    "$schema": "https://example.com/reference/dialect/2020-12/modelref",
    "$id": "https://example.com/schemas/ypr",
    "description": "GeoPose YPR angles",

    "@modelReference": "geopose.context.jsonld",
    "type": "object",
    "properties": {
        "yaw": {
            "type": "number"
        },
        "pitch": {
            "type": "number"
        },
        "roll": {
            "type": "number"
        }
    }   
}       

Sample instance

{
    "yaw": 10, 
    "pitch": 20,
    "roll": 5
} 

Annotation output in the proposed draft-next "list" format:

This is new format is not yet implemented in jschon but it's a nicer format to read, and handles/names the locations more intuitively.

Note that the empty string "" is the plain-string JSON Pointer for the entire document. So this says that the entire instance is annotated with "@modelReference": "geopose.context.jsonld".

{
  "valid": true,
  "details": [
    {
      "valid": true,
      "schemaLocation": "https://example.com/schemas/ypr#",
      "instanceLocation": "",
      "evaluationPath": "",
      "annotations": {
        "description": "GeoPose YPR angles",
        "@modelReference": "geopose.context.jsonld",
        "properties": [
          "yaw",
          "pitch",
          "roll"
        ]
      }
    }
  ]
}

Annotation output in 2020-12 "basic" format

{
  "valid": true,
  "annotations": [
    {
      "instanceLocation": "",
      "keywordLocation": "/description",
      "absoluteKeywordLocation": "https://example.com/schemas/ypr#/description",
      "annotation": "GeoPose YPR angles"
    },
    {
      "instanceLocation": "",
      "keywordLocation": "/@modelReference",
      "absoluteKeywordLocation": "https://example.com/schemas/ypr#/%40modelReference",
      "annotation": "geopose.context.jsonld"
    },
    {
      "instanceLocation": "",
      "keywordLocation": "/properties",
      "absoluteKeywordLocation": "https://example.com/schemas/ypr#/properties",
      "annotation": [
        "yaw",
        "pitch",
        "roll"
      ]
    }
  ]
}
handrews commented 1 year ago

I am actually currently using JSON Schema annotations to build an RDF graph from JSON instance data by annotating the data with the appropriate IRIs or other information. That project, which is using jschon and the Python rdflib, is not yet public but will be at some point reasonably soon.

TallTed commented 1 year ago

@rob-metalinkage -- Please revisit your https://github.com/json-ld/json-ld.org/issues/612#issuecomment-1431076708 and make sure each instance of @context is code fenced, like `@context`, so that that GitHub user is not forced to read about this conversation, in which they're not otherwise participating.

rob-metalinkage commented 1 year ago

Thanks @handrews - i think this looks like you are saying the approach would work but I'm afraid i got a little lost in the nature and purpose of the annotations you are suggesting - is this exploiting the @modelReference to fetch annotations during a schema validation?

ioggstream commented 1 year ago

Hi @handrews, seems similar to https://gist.github.com/ioggstream/8e858509a3ca535c5af230986aeefaf7

you can look at the implementation based on REST API Linked Data Keywords for generating JSON-LD using this spec: https://github.com/ioggstream/draft-polli-restapi-ld-keywords/blob/gh-pages/oasld.py#L128 that uses two keywords:

PS (we used x- instead of @ to avoid breaking code-generation tools and support OAS3.0 which is the most used implementation in dev/tooling; moreover, this could create issues if JSON-LD will decide do define a @ keyword)

rob-metalinkage commented 1 year ago

thanks @ioggstream - am agnostic about the property name, and it seems to have consistent usage here. Just asking for help to converge on a sensible choice we can all live with and hopefully make visible as the Best, or the One ,way to do this..

handrews commented 1 year ago

OK there is a lot going on here. I don't yet entirely understand either @rob-metalinkage or @ioggstream 's proposals, much less how to reconcile them. I read through the Web Annotations spec, and while that makes a lot of sense to me, I don't see how it fits with either of your proposals.

A further source of likely confusion is that "annotation" is being used in different contexts here (and "annotation collection", which is a list of annotations in Web Annotations and a process of collecting annotations in JSON Schema).

I would convert these JSON Schema annotations:

{
  "valid": true,
  "details": [
    {
      "valid": true,
      "schemaLocation": "https://example.com/schemas/ypr#",
      "instanceLocation": "",
      "evaluationPath": "",
      "annotations": {
        "description": "GeoPose YPR angles",
        "@modelReference": "geopose.context.jsonld",
        "properties": [
          "yaw",
          "pitch",
          "roll"
        ]
      }
    }
  ]
}

to these Web Annotations, using auto-generated UUIDs for the annotation id, and using https://example.com/ypr as the arbitrarily-assigned URI of the instance document. Note that the schemaLocation, extended with the annotation keyword involved, becomes the body URI, while the instance location becomes the target URI. Keep in mind that the empty string (and therefore the empty fragment) is the JSON Pointer for the whole document. [EDIT: I left out the "properties" annotation because it is used for internal keyword communication, and usually is not of interest to end users.]:

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "https://example.com/collections/1",
  "type": "AnnotationCollection",
  "label": "Annotations from JSON Schema evaluation",
  "total": 2,
  "first": {
    "id": "https://example.com/collections/1/pages/1",
    "type": "AnnotationPage",
    "startIndex": 0,
    "items": [
      {
        "@context": "http://www.w3.org/ns/anno.jsonld",
        "id": "urn:uuid:9b4678c4-7200-4f34-a1dc-c42a21878d8e",
        "type": "Annotation",
        "body": "https://example.com/schemas/ypr#/description",
        "target": "https://example.com/ypr#"
      },
      {
        "@context": "http://www.w3.org/ns/anno.jsonld",
        "id": "urn:uuid:d6765790-9c15-4ccb-a765-9f51a94c9624",
        "type": "Annotation",
        "body": "https://example.com/schemas/ypr#/%40modelReference",
        "target": "https://example.com/ypr#"
      }
    ]
  }
}

Alternatively, with the annotation body text inlined:

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "https://example.com/collections/1",
  "type": "AnnotationCollection",
  "label": "Annotations from JSON Schema evaluation",
  "total": 2,
  "first": {
    "id": "https://example.com/collections/1/pages/1",
    "type": "AnnotationPage",
    "startIndex": 0,
    "items": [
      {
        "@context": "http://www.w3.org/ns/anno.jsonld",
        "id": "urn:uuid:19c64499-b74a-42fe-bc4f-775dcb948623",
        "type": "Annotation",
        "bodyValue": "GeoPose YPR angles",
        "target": "https://example.com/ypr#"
      },
      {
        "@context": "http://www.w3.org/ns/anno.jsonld",
        "id": "urn:uuid:fce3dd30-d11a-4897-a0c6-92ba464812d0",
        "type": "Annotation",
        "bodyValue": "geopose.context.jsonld",
        "target": "https://example.com/ypr#"
      }
    ]
  }
}
handrews commented 1 year ago

BTW I don't mean that I can't understand @rob-metalinkage and @ioggstream's proposals, just that there's a lot of assumed context and I have not had time to do all of the research involved. That paper that @rob-metalinkage referenced about JSON Schema has so much to do with XML and WSDL and existing tool ecosystems that I can't make heads or tails of it despite being a co-author of the JSON Schema spec.

A simpler explanation of the desired outcomes would be very helpful. JSON Schema can associate arbitrary data with an instance, which is a well-defined process in draft 2019-09 and later. So these proposals that talk about adding an annotation feature to JSON Schema don't make much sense to me. It already has one and has had one for more than 3 years now. And that process can easily be retrofitted to draft-07 or earlier as it does not impact validation outcomes.

Any process that is trying to enhance an instance with data from a schema, including with the purpose of transforming that instance from its original form to another (e.g.JSON-LD) should be built on that well-defined JSON Schema annotation process. Including if the desire is to annotate a schema from a meta-schema.

rob-metalinkage commented 1 year ago

OK let me state the starting point - a desire to tell a data consumer what all the elements of a schema actually mean - by binding them to a URI which can deference to a semantic model. From the perspective of the schema and instance data what it dereferences to may not matter, but it may allow us to do other things - such as access annotations to embed in the schema for example.

The goal is thus two-fold 1 ) to be able to process a schema, which references a bunch of other schemas, in order to generate a JSON-LD context document that allows a JSON data conforming to the schema to be read as RDF conforming to the semantic model being referenced. 2) to allow tooling to use the schema to explain the elements in the absence of specific data instances.

I'm guessing the schema annotations allow the second, (though i'm not familiar with the tooling available), so being able to dereference the reference to the semantic model and populate the schema annotations with a script works fine.

Note that both the schema and semantic models may exist already - so JSON-LD contexts alone do not allow serialization into a specific JSON schema, and i see no tool support for this. Likewise, potentially annotations in a schema could be used to generate a RDFS, OWL and/or SHACL semantic model, but thats not necessarily useful if we are looking to implement existing semantic models in OpenAPI using predictable schema...

whilst i can see that the annotations approach can allow a separate artefact to make statements to bind the schema to the context its seems very verbose from the perspective of just creating a context to document instance data conforming to a schema,. AFAICT it could be assembled starting with the simpler schema element referencing the model - presuming its the form some tooling can use and is therefore worthwhile. I cant see maintaining a separate annotations document for every schema being attractive for schema publishers, when we already have all the information in the schema and model, and only need the context reference to enable one to be built if needed.

which brings us back to a canonical reference ... and i think the issue is the suggested naming of the property to carry it. No problems settling on x-jsonld-context.

handrews commented 1 year ago

@rob-metalinkage thanks for taking the time to explain in more detail. I'm afraid I'm still a bit confused but I think I can ask better questions now :-)

a desire to tell a data consumer what all the elements of a schema actually mean

If I understand this correctly, a more precise way to say this would be: A desire to tell a data consumer what data elements that are validated by a schema mean. The schema conveys the meaning, but the instance is what actually has meaning. In our example, the schema carries the information "@modelReference": "geopose.context.jsonld", but it is the instance {"yaw": 10, "pitch": 20, "roll": 5} that actually has the meaning conveyed by the "@modelReference" value.

Does this sound accurate to you?

Note that both the schema and semantic models may exist already - so JSON-LD contexts alone do not allow serialization into a specific JSON schema, and i see no tool support for this.

I'm confused about what is intended to be serialized here (the existing semantic models?) and what it means to serialize something "into" a JSON Schema. Do you mean adding fields such as "@modelReference" to the schema? I would not refer to that as serialization which might be the source of confusion.

Likewise, potentially annotations in a schema could be used to generate a RDFS, OWL and/or SHACL semantic model, but thats not necessarily useful if we are looking to implement existing semantic models in OpenAPI using predictable schema...

I don't understand why this is not useful, or what advantages the alternative proposes. Possibly because I am still unclear on what the alternative output is. You show this as output:

{
 "@context": {
   "properties": {
     "@id": "https://purl.org/geojson/vocab#properties",
     "@context": {
       "hasResult": {
         "@id": "http://www.w3.org/ns/sosa/hasResult",
         "@context": {
           "distance": "http://example.com/vectorObservation/distance",
           "geopose": {
             "@id": "http://example.com/vectorObservation/geopose",
             "@context": {
               "angles": {
                 "@id": "http://example.com/geopose/angles",
                 "@context": {
                   "pitch": "http://example.com/geopose/pitch",
                   "yaw": "http://example.com/geopose/yaw",
                   "roll": "http://example.com/geopose/roll"
                 }
               },...

But I don't understand how this gets produced from "geopose.context.jsonld". Is "geopose.context.jsonld" a URI fragment or suffix of some sort? How would a tool know what to do with it? I don't see anything in your example schema that tells me how this is used. Perhaps this is obvious to someone more familiar with JSON-LD?

I also don't understand where the general structure with the "hasResult", "distance", and "angles" fields are coming from as they don't appear in your example schema at all. So it's hard for me to show how existing JSON Schema behavior can support what you want, because I don't understand how you produced it, even manually.

I cant see maintaining a separate annotations document for every schema being attractive for schema publishers

There is no separate document to maintain.

I have shown you output from evaluating the instance against the schema. The output is not the end result, it's the thing you use to create the end result. Which I could show more clearly if I understand how your example output was constructed from your example schema. The output format is important for building tools. I am assuming that you don't want to re-implement JSON Schema validation, so you need some way to leverage that process.

JSON Schema doesn't know what you want to do with the "@modelReference" annotation, so it tells you "here, I matched this annotation keyword "@modelReference", which had the value "geopose.context.jsonld", to the instance location "" (empty JSON Pointer for the root of the document) from schema location "https://example.com/schemas/ypr#" (empty fragment, meaning the root schema object), which I reached through evaluation path "" (empty JSON Pointer meaning the root of evaluation, a.k.a. the first schema location applied to the instance). Your tool for doing whatever you want to do would make use of that information - presumably you are doing something like that somewhere?

Since you mentioned Web Annotations, I thought you were trying to produce those, so I showed what that would look like if I converted the evaluation output to a Web Annotation Collection. But I don't see any mentions of Web Annotations in your most recent comment. Are they relevant here? [I realize now that they were mentioned by some other document you mentioned - I was confused.]

handrews commented 1 year ago

@ioggstream I have read through your draft in more detail - it's quite interesting and relevant to some things I have been doing, but I will follow up with you elsewhere on that.

handrews commented 1 year ago

@rob-metalinkage I now realize that you didn't mention Web Annotations, but that was from another document that you mentioned. So I apologize for the confusion there! You can disregard that part.

davaya commented 1 year ago

@handrews:

If I understand this correctly, a more precise way to say this would be: A desire to tell a data consumer what data elements that are validated by a schema mean. The schema conveys the meaning, but the instance is what actually has meaning.

As I see it the schema conveys syntax, but if you want meaning you need an ontology. My goal is not to "process a JSON schema to generate a JSON-LD context document", my goal is to enable validation of JSON instances without requiring instances to carry RDF data or developers to know anything about RDF. Yet still provide a loose coupling between logical (RDF) and physical (JSON Schema) data models for applications where semantics is useful.

"definitions": {
  "Coordinate": {
    "type": "array",
    "modelReference": "https://w3id.org/arco/ontology/location/Coordinates",
    "additionalItems": false,
    "items": [
      {"$ref": "#/definitions/Latitude"},
      {"$ref": "#/definitions/Longitude"}
    ],
    "description": "A GPS (WGS84) coordinate."
  },
  "Latitude": {
    "type": "number",
    "minimum": -90.0,
    "maximum": 90.0,
    "description": "The angular distance of a place north or south of the earth's equator"
  },
}

validates this short and to-the-point instance.

[38.88949, -77.03529]

The JSON schema reveals type and property names, but an RDF reference allows retrieval of more collateral information about Coordinates in general and this coordinate in particular.

In this example modelReference is not an @-keyword to avoid any possible confusion with JSON-LD, it's just intended to be a generic annotation. I'm looking for the least-intrusive convention to represent such links (I would have called it rdfReference if there were no precedent, since there are many kinds of models).