Adding semantic context to APIs / Schemas

pchampin commented 2 years ago

I discovered recently this white paper by @ioggstream. It investigates different ways through which Open API specification or JSON Schemas can be semantically enriched with JSON-LD contexts.

Open API and JSON Schema are largely used, and provide good syntactic interoperability. JSON-LD, on the other hand, is designed to provide semantic interoperability. Making it easier to integrate the latter with the former would help imrpove its adoption, and improve the overall interoperability of the resulting APIs and schema.

As I understand, @ioddstream's white paper is an effort lead by the itarlian department for digital transformation. But I expect that other organizations would be interested in this. @msporny are you aware of similar efforts in bridging the gap between JSON Schema and JSON-LD?

Bringing this discussion to an existing or a dedicated community group could help federate multiple stakeholders with a shared interest in "semantically enriched schemas", and incubate a possible future REC.

VladimirAlexiev commented 2 years ago

@OR13 and @nissimsan widely use JSON Schema and JSON-LD in https://github.com/w3c-ccg/traceability-vocab

nissimsan commented 2 years ago

Thanks @VladimirAlexiev!

That's right, this is how we define the trace-vocab schemas (subsequently used to issue Verifiable Credentials from). We inject the term definitions into schemas using with a $linkedData element, see for example here.

Here's the implementation.

ioggstream commented 2 years ago

Hi @nissimsan. I saw your work and I think that use cases are non completely overlapping. While I'd be happy to get your review on the white paper I will try to clarify the differences, but I have to say that I work in an open environment where I have thousands of providers and consumers that I cannot govern in a strict way. Instead I have to provide rules that should not impact too much with most of their actual way of work / tools.

Different Goals

iiuc, your work is linked data oriented, so you express the syntax of a JSON-LD object, including the "@context" using JSON Schema. This means that your objects are natively based on linked data. See this section of the analysis
our goal instead, is to provide semantic information to already existing OpenAPI 3.0 documents (json schema draft 4) written by different organizations. These documents are not based on JSON LD, and only a subset of fields will be actually mappable

Issue 1 - providing information using standard fields

In our first attempts in doing this semantic mappings, we overloaded the externalDocs OAS3 property to reference the semantic asset, eg.

    TaxCode:
      description: A physical person tax code.
      example: RSSMRA75L01H501A
      externalDocs:
        url: https://w3id.org/italia/onto/CPV/taxCode

this approach is similar to your way of overloading $comment, Applying this strategy to third party schemas will eventually end in conflicts (because API providers are free to use externalDocs ) in other ways. At the same time, an implementation/tool is not required to interpret a standard field in a specific way.

Issue 2 - Providing semantic information inside each field

We tried to provide semantic info on a per-field basis, e.g.

properties:
  givenName:
    type: string
    x-jsond-language: "en"
    x-jsonld-id: "https://w3id.org/italia/onto/CPV/givenName"

but in this case, the @context becomes a byproduct of the json-schema: to cover all json-ld features like "@vocab", "@base", "@language", ... the [SOAS] vocabulary has to be expanded significantly or integrated with [JLD]: this can be a lot of work [twitter thread]

My impression then, is this approach is simply too costly to implement in a wide ecosystem. In a closed environment, where every schema is produced by a unique source, this approach can be viable though.

OR13 commented 2 years ago

@ioggstream I'm not sure I am groking your key argument.

Are you saying don't overload OAS with LD syntax beyond URIs?

ioggstream commented 2 years ago

@OR13 there's not a single key argument. We tried different approaches and summarized our experience in the document.

The principal arguments for our PoC that is based on two custom OAS3.0 keywords x-jsonld-type and x-jsonld-context are:

since we need to tell other people how to annotate their APIs, we cannot use standard keys because they might already use them, and we can't force them to modify core part of their specs. Moreover, tools (stoplight, swagger-ui, ...) already interpret those fields in some way, so our tools should overwrite the standard tooling behavior in every organization (Italy has ~20k organizations, at least 10k provide APIs to other organizations);
we do not annotate every single property because assembling json-ld context reliably when you do not govern all the schemas can be hard. In the future (some years) this might change (e.g. as we tighten our API ecosystem).

OR13 commented 2 years ago

ahh ok.

totally agree, we stopped using $comment because of that.
also agree, we do the same... just because you can define all terms in OAS and JSON-LD doesn't mean you should.

ioggstream commented 2 years ago

@OR13 Questions

about $linkedData & co: why do you use json-schema features that are non-compatible with OAS3 (e.g. $-starting strings, const ) ? I understand const is nice though :)
about term: I wanted to do something like that, but the issue I had was

...
components:
  schemas:
    TaxCode:
      type: string
      $linkedData:
        "@id": "https://w3id.org/italia/onto/CPV/taxCode"
        "term": "taxCode"
    Contract:
    ....
    properties:
       employer_tax_code: { $ref: "#/components/schemas/TaxCode"} # <-- Beware! TaxCode.$linkedData.term == 'taxCode'
       employee_tax_code: { $ref: "#/components/schemas/TaxCode"} # <-- same as above

then I thought that it would be better if any data provider wrote down its @context. Maybe they will generate it at design time with some specific tools.

OR13 commented 2 years ago

Regarding 1, we invented the syntax originally thinking in terms of JSON Schema, not OAS.

Models are defined using the Schema Object, which is an extended subset of JSON Schema Specification Wright Draft 00.

https://swagger.io/specification/

I suspect that if we put OAS higher in terms of conformance than JSON Schema we likely would have used the x- extension patter, but there are folks in the LD community that dislike that... we're certainly open to changing the syntax, especially if there is better interop or support for off the shelf OAS tooling.

Regarding 2, excellent point.

This comes down the trade off between reusing schemas and vocabs and being more specific.... there are costs associated with "reusing over refining / re-defining"... I think there are probably a large number of cases where we are maybe not handling those trade offs as well as we could... naive solution is inline your schemas and use less refs.

w3c / strategy