JSON Schema/Hyper-Schema compatibility goals?

handrews commented 6 years ago

Hi folks- I feel like by scattering various comments related to JSON Schema and especially Hyper-Schema around a bunch of issues, it's coming across as numerous requests for small and not particularly compelling changes. So I'm filing this to see if we can decide on a goal for compatibility, and then I can make comments with the goal in mind and stop pushing anything that's not a goal.

And if we decide that there are no goals worth actively pursuing, that's fine, too, and will ensure that I'm not wasting your time.

What's "compatibility"?

There are several sorts of compatibility we could shoot for, for either JSON Schema Validation or JSON Hyper-Schema, or both.

In all cases, it is simple to ignore TD keywords, JSON-LD or otherwise, that are not part of JSON Schema's standard schema vocabularies. Or define them as extension schema vocabularies.

Full compatibility would mean that TD authors could use the same schemas both as part the TD and on their own with off-the-shelf JSON Schema implementations. This is what we're trying to get to with OpenAPI, as many users ask for it.

Specifically, this would mean that everywhere the TD uses a JSON Schema keyword in a schema object, the keyword has the same behavior as it does in regular JSON Schema. The TD may forbid some usage, but if the usage is allowed, the behavior is the same.

A JSON Schema implementation could be given a TD, recognize which parts are schema objects, and produce the correct results from those schema objects.

Automatic translation compatibility would mean that some minor details conflict, but we work out some sort of keyword modifier or renaming mechanism (for instance, a way for JSON Schema to recognize that the default of readOnly is flipped from its standard usage).

A JSON Schema implementation would be able to recognize these differences and handle them without special treatment.

This is pretty hand-wavy, and I don't have a clear rule in mind yet for what conflicts could be handled or how. But I have several ideas if we want to explore this.

External translation compatibility would mean that it would be possible to write a script to map back and forth between the TD schema usage and JSON Schema, but this would not be part of either the JSON Schema spec or the TD spec.

This is the situation OpenAPI is in right now. It's not great, and it's a recurring complaint in the community, but I don't now how much of the community cares so don't take this as an apocalyptic problem. It definitely frustrates people who know both technologies, though.

No particular compatibility is also an option, where you just take some ideas from JSON [Hyper-]Schema but aren't concerned about ensuring any of the above work.

Why have Validation compatibility?

I think this is pretty clear- JSON Schema is widely used and deployed, and is part of other API systems. Full compatibility might be possible, and if not I suspect automatic translation compatibility of some sort wouldn't be that hard. And would be a good test case for the schema vocabulary concept.

With schema vocabularies, saying "we are using all assertions, but our own annotations" would be straightforward.

Assertions are things like minimum, that are evaluated to a boolean result with just the schema and instance.

Annotations are things like readOnly, which an application needs to interpret (in this case, because something is only writeable if you can write the instance to some kind of storage).

Why have Hyper-Schema compatibility?

This is admittedly less clear. We now have one and possibly two implementations of a Hyper-Schema draft-07 underway (JavaScript and possibly Go). This is as a client and not just for documentation generation, which puts draft-07's adoption as a hypermedia system ahead of draft-04 already :-) But it's still unproven.

On the other hand, it's pretty easy for us to change Hyper-Schema to make it more suitable for integration with the TD as long as it does not break other integration use cases. So if Hyper-Schema is almost but not quite suitable, we might be able to change it if that helps the TD.

My main reason for continuing to push this is that I see the same issues coming up for you that we spent the last year discussing and resolving. We put a lot of thought into those solutions, so we think they're worth consideration.

But we don't have real-world experience as you do with the plugfests, so if you have already shown that our solutions actually aren't suitable, that would be helpful for us.

Actions are of particular interest, as Hyper-Schema does not provide a direct equivalent. However, I'm (reluctantly) coming to the conclusion that it probably needs to and if the rest of Hyper-Schema ends up being compatible with the TD, that's a strong motivation for us to align with your actions approach.

Moving forward

For Validation (potentially including annotations like readOnly and default), if there's consensus on what level of compatibility is desired, I can probably figure out the gap and make a proposal pretty easily.

For Hyper-Schema, since both sides are somewhat in flux, it's a little less clear. If there is any interest in evaluating this, I'd like to work with someone on the TD side to figure out the differences both in terms of simple syntax (e.g. mediaType vs targetMediaType) and at a higher level (links vs forms; actions) and report back on options. We can then decide if we want to pursue it further.

If there's no interest in Hyper-Schema, I'd like to get clear on that so I can stop taking up space in issues bringing it up when it's not helpful.

handrews commented 6 years ago

Paging @mkovatsc @benfrancis

benfrancis commented 6 years ago

@handrews I'm just replying to let you know you that I'm not ignoring this, I just don't have a very good answer to your questions yet. I'm not yet sure of the best way to incorporate parts of JSON Schema into the Thing Description format. There are clearly areas of overlap and potential for re-use, but I'm not sure what level of compatibility is the best approach.

Your input is certainly welcome from my point of view as you've obviously spent a lot of time looking at these shared problems!

mkovatsc commented 6 years ago

@handrews this is an overdue update after our resolution during the F2F about WoT Thing Description. There are two major changes to (hopefully) enable full compatibility:

Base TD on JSON-LD 1.1, which allows for object notation instead of the tedious arrays of JSON-LD 1.0. The object key becomes the identifier we previously put into name.
Make Properties recursive, meaning they can have a field properties and each Property has the schema fields directly, not in a schema sub-field.

With full compatibility I also mean that each entry of the top-level properties structure can be fed into a JSON Schema validator to validate payloads coming and going to the Property interactions. Same goes for inputs and outputs of Action interactions and Event (here we still need to nail down if they have also an output field or should be flatter like Properties).

One additional entry not in the default JSON Schema vocabulary is forms when it is an Interaction Property and not only a property of an object type. Here we need to check if we might want to align with JSON Hyperschema here.

The others are writable (the inversed readOny; still under discussion) and observable.

Example:

{
  /* ... Thing metadata ... */
  "properties": {
    "status": {
      "writable": false,
      "type": "object",
      "properties": {
        "battery": {
          "type": "number",
          "minimum": 0.0,
          "maximum": 100.0,
          "forms": [{ /* note that this sub-property is also an Interaction that can be accessed individually */
            "href": "/things/lamp/properties/status/batt",
            "mediaType": "application/json"
          }]
        },
        "rssi": {
          "type": "number",
          "minimum": 0.0,
          "maximum": 1.0
          /* this is a pure sub-property that cannot be accessed (not an Interaction) */
        },
        "level": {
          "type": "integer",
          "minimum": 0,
          "maximum": 100
          /* this is a pure sub-property that cannot be accessed (not an Interaction) */
        }
      },
      "forms": [{ /* this field makes the property (also) an Interaction Property */
        "href": "/things/lamp/properties/status",
        "mediaType": "application/json"
      }]
    }

Please refer to https://w3c.github.io/wot-thing-description/proposals/simplified-td/#minimal-simplified-td for more explanations and examples. We also updated the Editors' Draft, but it still contains a few mistakes that might be confusing.

@handrews What do you think about this?

If we want to go for full compatibility the challenge is now how to keep the drafts in sync -- I am not even 100% sure this is in sync with JSON Schema draft 07.

mkovatsc commented 6 years ago

For Hyper-Schema, since both sides are somewhat in flux, it's a little less clear. If there is any interest in evaluating this, I'd like to work with someone on the TD side to figure out the differences both in terms of simple syntax (e.g. mediaType vs targetMediaType) and at a higher level (links vs forms; actions) and report back on options.

I would love to have only one solution here and fully align. It would be great to understand the differences and in particular why they are there.

handrews commented 6 years ago

@mkovatsc I will write something up. It's a bit complex as I'm working on a similar thing about Hyper-Schema for OpenAPI (which will help- their case is simpler) and also taking an extended break from the tech industry starting this weekend. While I am not abandoning JSON Schema, and particularly Hyper-Schema, after an initial burst to complete the backlog of work for draft-08 it will probably go a bit on the back burner for me. But I will add to this issue and keep an eye on or anything else really focused on JSON Schema. I may not be all that likely to notice mentions elsewhere, though.

handrews commented 6 years ago

@benfrancis @mkovatsc I've finally had enough time since leaving my job to get back in the mood for tech work. So let me try to give some basics of Hyper-Schema to see if we are close to alignment there, and also give some thoughts on forms.

The only keywords in Hyper-Schema are base and links, which can appear in any root schema or subschema. Other keywords that used to be in the Hyper-Schema spec (readOnly and keywords for encoding other media types in JSON Strings) were moved to the validation spec, to keep Hyper-Schema focused on RFC 8288-style web linking. links takes a list of Link Description Objects (LDOs).

Looking at your link property, it's not quite in line with Hyper-Schema draft-07. While the Hyper-Schema LDO is a serialization of 8288's conceptual model (section 2), it offers much more functionality than the HTTP Link header serialization (section 3), while at least currently missing a few fields that are less relevant to APIs. Here's an overview of the differences:

Link URIs

href and anchor have the same name and purpose in HTTP and Hyper-Schema, but may be RFC 6570 URI Templates (of any level). If the templates resolve to URI-references, the reference is resolved against the nearest base (which is also a URI Template, and may be a reference that resolves against the next-nearest base, etc.). If there is no explicit base, the base URI is the effective request URI of the instance.

Hyper-Schema also offers anchorPointer for JSON-like media types (notably application/json) which do not allow fragments.

There are numerous other keywords for controlling how the template variables are filled out, either from the instance data (templatePointers), or from client input (hrefSchema), or some combination. And also for indicating whether a link is usable if a particular variable cannot be resolved (templateRequired).

Link relation type

rel has the same meaning, but you can't put multiple link relation types in a single string. Right now you can only have one relation type per LDO, but we'll probably allow an array of relation types. We just haven't gotten around to that yet.

rev is not supported. RFC 8288 notes that it is deprecated, so we're just not allowing it. Use anchor and href to reverse the link direction if it's somehow impossible to provide a suitable rel.

Target attributes

title is identical, but we don't (yet) support title*. We may want to take a different approach to languages simply because JSON is more flexible than HTTP header serialization. Hyper-Schema also supports description for providing a longer block of explanatory text. It's mostly used for generating documentation rather than at runtime.

hreflang is not supported, nor do we get asked about it much if at all. I haven't given it any real thought.

type is targetMediaType in Hyper-Schema. This is because there are at least two media types relevant to JSON Hyper-Schema. targetMediaType is the hinted media type of the target resource, matching targetSchema. There is also submissionMediaType, matching submissionSchema, describing what kind of representation the resource can accept for processing (For HTTP, this is just the media type and schema for an HTTP POST request). I think your link specification has mediaType here, so 8288, TD, and Hyper-Schema all differ on this one!

media is not supported, and I had completely forgotten it existed. It hasn't ever been relevant to APIs that I've worked with. The media keyword that used to be in the Hyper-Schema spec is unrelated (it is now contentMediaType in the validation spec, for encoding documents in JSON strings). I don't see us adding RFC 8288/HTML's media to Hyper-Schema unless someone finds a really compelling use case for why Hyper-Schema should care about display devices.

Protocol headers

Hyper-Schema also offers a way to provide protocol-specific hints for response headers (e.g. Allow, Accept-Patch in HTTP) and to provide a schema for particularly interesting request headers (so you can do things like advertise what preferences can be requested with Prefer). Hyper-Schema links are not expected to exhaustively document these, only the ones of particular interest to clients, either for optimizing avoiding a HEAD or OPTIONS, or for indicating what protocol features are available for requests.

Any thoughts on these differences? I think this has actually at least touched on every Hyper-Schema keyword. I'll make a separate comment about forms, as this is one long enough already and I need to read up on them in the context of the TD.

handrews commented 6 years ago

Note that I don't think that the TD necessarily would have to support every feature of Hyper-Schema. But it would be nice if they did not conflict in the areas where they overlap.

handrews commented 6 years ago

Just noticed this in a "forms" example:

/ encType would be for the request body opposed to mediaType, which is for target /

This is exactly why we replaced encType and mediaType with submissionMediaType and targetMediaType, respectively. In our view, these names from 1990s HTML forms are no longer intuitive, particularly because Hyper-Schema does not maintain a strict analogy with HTML forms- it offers a superset, leveraging URI Templates which weren't invented until much later.

I can see keeping them if you're directly mimicking HTML, but that did not work well for Hyper-Schema as interactive forms and automated APIs are different use cases. Interactive forms abstract away the details from the human users (GET vs POST, URI query string vs request body), while APIs are already aware of these details and require flexibility rather than abstraction.

sebastiankb commented 5 years ago

Is this issue still relevant? @handrews do you you see some overlap conflicts with Hyper-Schema?

handrews commented 5 years ago

@sebastiankb I'll have to catch up on recent changes here to answer that entirely. Any keywords I mentioned above are still the same in Hyper-Schema (rel now supports arrays with the same specification as space-separated rel values in HTTP's Link header, but that's all that's changed as best I can remember offhand).

It's also relevant how much you are interested in pursuing this. I don't feel the need to push it hard, particularly as Hyper-Schema remains something of a moving target as well.

mkovatsc commented 5 years ago

Let's focus on the JSON Schema compatibility. I would say this is given, as TD data schemes can be used directly as input for JSON Schema validator implementations. We currently still have not all your terms modeled, but adding vocabulary terms is easy, also at later stages :)

mkovatsc commented 5 years ago

@handrews we now have a frozen CR transition request version in the master.

This would be a good point in time to assess the current compatibility.

sebastiankb commented 4 years ago

As we are in the process of finalizing the current charter, I would like to propose that this discussion be continued in the new charter since the JSON scheme is and will be one of the core components in the TD.

handrews commented 4 years ago

@sebastiankb makes sense. Note that draft 2019-09 (formerly known as draft-08) has been published: http://json-schema.org/specification.html

This is a major update resolving some long-standing difficult questions, and formalizing an extensibility mechanism. OpenAPI is planning to use this draft in their 3.1 specification (in part because the extensibility mechanism can handle their extensions nicely, and it would be a good base for a code generation extension vocabulary).

We hope that there is only one more significant draft (mostly to fill in some gaps that we left because we felt we needed feedback on what we have before we could finish a few things) before moving to a more formal standardization process. @Relequestual is taking point on the actual standardization process. That draft should happen within the next year, preferably within the next six months- the various personal and project-related reasons for the year-long delay of this draft shouldn't reoccur.

w3c / wot-thing-description