w3c / wot-thing-description

Web of Things (WoT) Thing Description
http://w3c.github.io/wot-thing-description/
Other
131 stars 63 forks source link

Form relation type values (rel) #179

Closed mkovatsc closed 5 years ago

mkovatsc commented 6 years ago

Something that got lost in the current Draft is a clean definition of the values that can be put into the rel fields of forms. They had been collected in the Binding Templates and should move to the TD.

In the Scripting API, we changed to "get" and "set" for Properties, as this is the "traditional way" for properties (a programming abstraction).

@mjkoster @zolkis Should we align? In the API, I definitely prefer get() and set(). run() is supposed to change to invoke().

mkovatsc commented 6 years ago

Something more to fix: Form relation types should follow the same rules as link relation types:

https://tools.ietf.org/html/rfc8288#section-2.1

The naming constraint is LOALPHA *( LOALPHA / DIGIT / "." / "-" ) for registered types and URI for unregistered types ("Extension Relation Types").

They "MUST be compared character by character in a case-insensitive fashion". Thus, the current camel case does not make much sense.

To use short terms, we would need to register the terms with IANA: https://tools.ietf.org/html/rfc8288#section-2.1.1.1

Alternatively, we can use Extension Relation Types, which use URIs. Within a TD, we could use JSON-LD capabilities to map the short terms to full URIs.

Yet for raw JSON processing, we would need pre-processing to expand the short terms to the full URIs. There is a similar issue is with the links field, in particular, if we want compatibility with draft-ietf-core-links-json.

handrews commented 6 years ago

@mkovatsc I'm confused: In a past issue, you stated that rel for forms is not the same as rel for links, and that's why it's OK to put verbs like "invokeAction" in it. But you are referencing RFC 8288 which is about links. And particularly about the link relation registry.

What is the intended relation of rel for forms and rel for links? I still find this an extremely confusing overload of the term, but (across multiple projects) I seem to be one of very few who thinks so. For operations on top of links in Hyper-Schema, I am tentatively using intent (exact name subject to change) to name the operation, as it is distinct from the underlying relationship.

mkovatsc commented 6 years ago

@handrews The main intent of the previous post was to make them similar to link relation types, as they fulfill a similar role and should also be useable outside JSON-LD documents, where a term is not automatically replaced with a full URI.

See https://github.com/w3c/wot-thing-description/issues/108#issuecomment-371788340 for related work in Coral. The original name pick in the overall line of work was "form relation type", reusing the same attribute name.

As there is no preceding standard, we could go for are more descriptive attribute name that does not cause confusion. While discussing forms, it has become central that links describe relations and forms describe operations. intent is a candidate, maybe a bit long? @ektrah comments?

mjkoster commented 6 years ago
The naming constraint is LOALPHA *( LOALPHA / DIGIT / "." / "-" ) for registered types 
and URI for unregistered types ("Extension Relation Types").

Yes, good point. We should follow the syntax and namespace rules so "rel" in forms can be processed reusing "rel" processing in links. We do need to be clear about where a URI is needed, and it's probably a good plan to register some form relation types with IANA.

@handrews My understanding is that "rel" in forms also describes a relation between the context and target resources, and is different from links in that it describes performing some "action" on the context resource by invoking a protocol method on the target resource. So in the case of a TD, readproperty is something you perform on the context of the form (which is a property interaction) by doing a GET method on the target resource. Likewise, invokeaction is something you perform on the Action interaction by doing a POST on the target resource (href).

I would say that, for links and forms, "rel" is how application descriptions are mapped onto resources and methods using semantic identifiers. The other components of a link or form are the context identifier "anchor", and optional target attributes, where "method" is an example that applies to forms.

ektrah commented 6 years ago

The idea of the term “form relation type” was indeed that it gives the semantics of a connection between resources, where that connection is by the means of a form instead of a link. I’m open to a better term, though, if that’s found too confusing.

An earlier draft proposed to create an IANA registry for form relation types mirroring the link relation type registry created by RFC 5988. However, I’m more and more convinced that the process of registering every single relation type that might be useful is too cumbersome and a road block. (Imagine the Semantic Web had to have all its RDF predicates registered in a central database!) I’m therefore looking at alternative ways now.

handrews commented 6 years ago

@mkovatsc intent is a very tentative name, it could end up being op (for operation) or any number of other things.

@ektrah

The idea of the term “form relation type” was indeed that it gives the semantics of a connection between resources, where that connection is by the means of a form instead of a link. I’m open to a better term, though, if that’s found too confusing.

My objection here is that things like "invokeAction" are not relationships. And throughout the past five or so years that I've been working primarily on APIs, getting people to understand that rel means relationship and that rel values should therefore express relationships is one of the biggest challenges I face with new teams.

Even if you say that this is a "form relation type" and therefore it's different... why call it relation when you are going to use verb phrases as values?

@mkovatsc

While discussing forms, it has become central that links describe relations and forms describe operations.

Yes, this is exactly where I've ended up with Hyper-Schema. And verb phrases make perfect sense as operation names. And the term "operation" is popular with OpenAPI, which is another major user of JSON Schema.

I have taken the approach of talking about links (RFC 5988/8288, expressing a relation between resources, optionally with some interaction hints such as likely target media type) and operations (a specific way of making requests and understanding responses across a link).

I have avoided the term "form" because historically in JSON Schema it was identified directly with HTML forms and caused tremendous amounts of confusion and frustration. I personally tend to think of forms as the presentation aspect of soliciting input for an operation, but I'm not trying to sell anyone else on that definition.

So... back to links and operations:

There is always an RFC 8288-style link involved. In JSON Hyper-Schema, this is serialized as the Link Description Object (LDO), and as of draft-07 it explicitly correlates with RFC 8288. In a pure sense, the existence of a link does not necessarily tell you anything about what you can and can't do with it. Of course, target hints can make that more clear in various ways.

Operations describe a specific usage of a link. The (still totally vague and almost entirely in my head) proposal would be to add a list of Operation Description Objects (ODOs) to Hyper-Schema's LDO. ODOs might actually be an extension spec so that people who only want links are not forced to implement ODO support. Note that an operation should be sufficiently detailed that a UI could use it to generate a web form, possibly using JavaScript to execute a PUT or a DELETE, or some other form of data input interface.

My expectation is that you would use both the link's rel and the operation's intent (or whatever it's called) to determine what to do. If you have a link with a rel of tag:example.com,2018:fan with an operation of tag:example.com,2018:power-on, the link tells you what you have, and the operation tells you what to do with it. This avoids things that I've seen like tag:example.com,2018:turn-on-fan, tag:example.com,2018:turn-on-refrigerator, tag:example.com,2018:turn-on-microwave, etc.

There are two separate layers, or maybe constraints here: What is the thing (expressed in relation to your current context) and what can you do with it? With semantic web technologies, you can get more information from the data itself, so this may not make as much sense. Hyper-Schema, of course, has to be usable with plain JSON.

If you do use both the link rel and the operation intent together in this way, to avoid an explosion of compound phrases like "turn-on-refrigerator", then it become very important not to use rel for your operation/form. Because then you have two rels that you use together to mean different things, and hopefully we can all agree that that would be confusing. I realize that is probably not what you have here, and the approach I'm considering with Hyper-Schema may not fit at all. But that's where I am with it right now.

Mostly, I'm very frustrated by seeing more and more people just decide that it's OK to use verbs for rel. That's just not what it means, grammatically or based on RFC 8288.

mjkoster commented 6 years ago

I would suggest that a form relation does describe a relation between two resources. It indicates that the resource pointed to by the form (href) is related to the context resource by what it does to the context.

For example, the WoT form relation for performing a GET on a WoT Property interaction is "getproperty". The form relation construct indicates that to "getproperty" of the context resource (the WoT Property interaction), you interact with the resource pointed to by href.

The reason for specializing the form relation to the target resource is so we can accommodate different patterns for the methods to apply to the resources, for example using POST instead of PUT to "setproperty", which some existing APIs require, and we use the form construct to adapt to.

We don't propose to have either "refrigerator" or "turnon" as relation types. The only new relation types are those listed above, so there would be no explosion of types due to our use of "rel".

In a WoT TD using iot.schema.org annotation, refrigerator is a feature of interest type and turnon is an action type, so a WoT Thing Description would include this example with a form that indicates to invoke the specified action (iot:turnOnAction) you POST a conforming input representation to the location at href.

The resource at href is related to the context ("turnon", which is an action) by being how you invoke the action, or "rel": "invokeaction"..

"actions": {
  "turnon": {
    "@type": "iot:turnonAction",
    "iot:isActionOf": {"iot:equipmentType": "iot:refrigerator"},
    "input": {
      "type": "boolean",
      "const": true"
    },
    "forms": [
      {
        "href": "/example/actions/turnon",
        "rel": "invokeaction",
        "http:methodName": "POST",
        "mediaType": "application/json"
      }
    ]
  }
}
handrews commented 6 years ago

@mjkoster 90% of that I am totally on board with. It's not quite the same as how I'm looking at Hyper-Schema but that's not surprising as we're operating under different constraints.

But I have a lot of trouble with this:

It indicates that the resource pointed to by the form (href) is related to the context resource by what it does to the context.... The form relation construct indicates that to "getproperty" of the context resource (the WoT Property interaction), you interact with the resource pointed to by href.

How is 'to "getProperty"' any sort of relationship? It's an activity. It's a verb phrase. Grammatically, that is simply not a relationship. And I know, this is engineering, not English, but if you must use rel why not use "gettableProperty" which at least makes grammatical sense?

But really, I don't buy that there is a sensible relationship here at all, and calling it rel just further muddies a concept that people already struggle with by having it mean two very different things. Using "getProperty" as the value of a field called operation or op or something of that sort is intuitive and grammatically correct. And there is no danger of confusing it with RFC 8288 link relations. Perhaps I'm in the minority or even alone in finding this confusing, but I do find it very confusing.

mjkoster commented 6 years ago

Yes, forms are specifications for verbs. That's precisely how they differ from links. The verb described by each form is expressed in its relation type.

We are describing, with the form control, how a particular resource, indicated by href, is used to do something (verb) with respect to (relation) the context of the link (action, event, property in WoT).

So in a sense we are encoding both relation and verb, where we want to describe the relation of this resource to its context in terms of what it does (invoke, read, write, subscribe...), which is the main semantic connection between the resource described by a form and its context.

I do see the point, that you would like to narrow the definition of rel because people are already having trouble with it. I fear that adding another term, which is not a target attribute, to describe what a resource does to its context to differentiate from what a resource means* to its context, is introducing another new term for what I see as a broader definition of an existing concept.

handrews commented 6 years ago

@mkovatsc [EDIT: oops, I meant @mjkoster] Yeah our so-far intractable difference of opinion is that you see forms as a broadening of the concept of links, and I see them as a very different thing that (for lack of a better phrase) sits on top of links.

The term "operation" has a good bit of precedent, though. OpenAPI uses it, and OpenAPI is quite popular, and increasingly so. I started using the term "operation" in JSON Hyper-Schema in the same sense as it is used in OpenAPI because it helped people understand the difference between something like OpenAPI (which gives you explicit request/response info) and Hyper-Schema as it currently exists (which is purely an RFC 8288 Link serialization, albeit with rather elaborate target hints in the form of schemas). This was a huge pain point in discussions of Hyper-Schema up through the very last weeks of work on draft-06. Starting with draft-06, keeping links and operations separate seems to have made Hyper-Schema a lot more accessible based on community reactions. I can't prove that the operations vs links technology was a major part of that, but there certainly seems to be less confusion now.

"Operation" is also just generally intuitive- it is something that you do, which is exactly what we're talking about. I think I've made my points by now, so I'll stop and see if anyone else finds this helpful. If not, I won't continue further. But I will be sad because in my experience this terminology will create more work for me when trying to explain it to people.

mkovatsc commented 6 years ago

you see forms as a broadening of the concept of links, and I see them as a very different thing that (for lack of a better phrase) sits on top of links

Actually, I do not see them as extension, but as a related concept at the same level. Hence, I do not see the relation aspect you describe with fan vs power-on. As @mjkoster, I believe then 'fan' aspect must be part of the context on which the operation acts. I would definitely not go for power-on-fan designs.

My obvious alternative to operation would be action. Usually, the response is an action result, not a representation. Unfortunately, this attribute is used for a URI in HTML forms. A serious conflict?

I would call this sibling concept of links still forms: even when there is no UI, no human user, clients still have to fill out the hypermedia control given by the server, e.g., to select the semantically correct payload or parameters. HTML forms are simply a human-centric representation of this hypermedia control.

I like the idea of making URI Templates a central part of a forms.

handrews commented 6 years ago

@mkovatsc

Actually, I do not see them as extension, but as a related concept at the same level.

This actually may not be a conflict, as I'm not all that happy with the "sits on top of" phrase but haven't figured out a better way to articulate it. "Related concept at the same level" is much closer to how I see it than "broadening of the concept of links".

I would definitely not go for power-on-fan designs.

I think we're all against those :-) My LDO + ODO is similar to your link + action + form (I think?) but I did not remember how the TD's actions fit in until @mjkoster gave that example.

My obvious alternative to operation would be action. Usually, the response is an action result, not a representation. Unfortunately, this attribute is used for a URI in HTML forms. A serious conflict?

In my experience, anything that looks like it comes from HTML forms will cause people to assume it works like HTML, and things go downhill from there. HTML is so pervasive that it carries a lot of intellectual inertia with it.

I would call this sibling concept of links still forms: even when there is no UI, no human user, clients still have to fill out the hypermedia control given by the server, e.g., to select the semantically correct payload or parameters. HTML forms are simply a human-centric representation of this hypermedia control.

I agree with this reasoning. It just got very confusing during Hyper-Schema's draft-06 discussions to have to continually make a distinction between "forms" as a generic concept and "forms" as in HTML, so we came up with another name. But that is certainly not something that I think is essential. Given the choice, I'd rather avoid confusion around rel but keep the term forms, and depending on how this goes I may reconsider terminology in Hyper-Schema. One problem we had was lack of intuitively recognizable non-HTML examples of "forms" that we could easily point to.

I like the idea of making URI Templates a central part of a forms.

Hyper-Schema is specifically designed to allow using any combination of URI Templates and/or request bodies and/or protocol headers, and providing equivalent levels of functionality for all mechanisms. An application could either present these all separately, or present a single UI form (or function call, or whatever type of interface) that hides the complexity from the caller/user. I think this is one of the most significant changes between draft-04 and draft-07 of Hyper-Schema, and a key difference between flexible APIs and relatively simple human-user-oriented HTML forms.

benfrancis commented 6 years ago

@mjkoster wrote:

The resource at href is related to the context ("turnon", which is an action) by being how you invoke the action, or "rel": "invokeaction"..

"actions": {
  "turnon": {
    "@type": "iot:turnonAction",
    "iot:isActionOf": {"iot:equipmentType": "iot:refrigerator"},
    "input": {
      "type": "boolean",
      "const": true"
    },
    "forms": [
      {
        "href": "/example/actions/turnon",
        "rel": "invokeaction",
        "http:methodName": "POST",
        "mediaType": "application/json"
      }
    ]
  }
}

I'd be interested to see a more complete example which includes getting and setting properties, requesting, canceling, listing and getting the status of action requests and getting lists of events.

Presumably with this notation, you'd need separate form objects to get and set a property using the same URL for example? e.g.

"properties": {
  "on": {
    "type": "boolean",
    "forms": [
      {
        "href": "/example/properties/on",
        "rel": "getProperty",
        "http:methodName": "GET",
        "mediaType": "application/json"
      },
      {
        "href": "/example/properties/on",
        "rel": "setProperty",
        "http:methodName": "PUT",
        "mediaType": "application/json"
      }
    ]
  }
}

I agree with @handrews that this doesn't seem like a "relation" between two resources, but rather describes an operation that can be carried out on a resource.

@handrews wrote:

I have taken the approach of talking about links (RFC 5988/8288, expressing a relation between resources, optionally with some interaction hints such as likely target media type) and operations (a specific way of making requests and understanding responses across a link). ... Operations describe a specific usage of a link.

This makes much more sense.

I have avoided the term "form" because historically in JSON Schema it was identified directly with HTML forms and caused tremendous amounts of confusion and frustration. I personally tend to think of forms as the presentation aspect of soliciting input for an operation

I strongly agree. A "form" could arguably make sense as a way to describe an action input, but it doesn't make sense for property resources and it definitely doesn't make sense for events.

The combination of "links" which describe relationships between resources (e.g. rel=property or rel=propertyOf) and "operations" which describe CRUD(N) style operations which can be carried out on that linked resource would make much more sense. Perhaps "operations" could be a member of a link object which enumerates the types of operations that can be performed on the linked resource, like the "ODOs" @handrews was talking about.

As I said in https://github.com/w3c/wot-thing-description/issues/151#issuecomment-411399991 the question is then whether we really want an Open API style description of an API inside the Thing Description (where clients must automatically adapt based on the content of that declarative protocol binding) or whether that's better suited to a concrete protocol binding specification (where clients just implement a standard API). I can see the former working for REST-style APIs using HTTP and CoAP, but the latter would definitely be more suited to WebSockets.

mkovatsc commented 6 years ago

"OpenAPI style" corresponds to concrete protocol bindings: code has to be generated specifically for the API.

TD and Hyperschema use a hypermedia-driven style where the server tells the client how to formulate requests. Of course clients have to be more agile to generate any kind of requests given a base protocol stack (HTTP, CoAP, MQTT, ...), but this is the point of it all.

mkovatsc commented 6 years ago

My LDO + ODO is similar to your link + action + form (I think?)

No, LDO + ODO are similar to links + forms. These work at the resource level. You can find this in CoRAL and a predecessor "CoRE-HAL", something that extended HAL with _forms to cover more than browsing.

Action (as well as Property and Event) is an interaction pattern of WoT TD, one abstraction level higher. Interaction patterns describe what resources provide: e.g., some resources simply return a document, some return a representation of a collection using links to sub-resources, some start sending events (cf. SSE using GET), some accept documents, some accept specific RPCs. These are all different patterns and WoT tries to narrow these patterns down to 3, a number that comes from prior work, in particular the COMPOSE project, that found that these three are a good sweetspot to be precise, flexible, and narrow.

How the resource is implement to fulfill each pattern is often very different, and hence told by the forms. A Property could be read by a simple GET. In LP-WAN devices using YANG, it could be a FETCH with a specific payload to read a Property. Similar for writing: Some use PUT, some use POST, some use PATCH. This is true for IoT, but also (or even more) for cloud services.

benfrancis commented 6 years ago

@mkovatsc:

"OpenAPI style" corresponds to concrete protocol bindings: code has to be generated specifically for the API. TD and Hyperschema use a hypermedia-driven style where the server tells the client how to formulate requests.

I'm not sure Open API is so different to what you're describing.

According to the spec: "The OpenAPI Specification (OAS) defines a standard, language-agnostic interface to RESTful APIs which allows both humans and computers to discover and understand the capabilities of the service without access to source code, documentation, or through network traffic inspection."

To be clear, when I use the term "concrete protocol binding" what I mean is a specification of a standard that all Web of Things servers and clients implement (e.g. a standard REST API or webthing WebSocket sub-protocol), so that any WoT client can talk to any WoT server in the same way that any browser can render any web page. i.e. no code has to be generated specifically for a particular web thing's API.

I'm open to the idea that a Thing Description could describe a REST-style API declaratively with a bit more flexibility, but I'd like to see it in action to prove that ad-hoc interoperability is actually possible.

Of course clients have to be more agile to generate any kind of requests given a base protocol stack (HTTP, CoAP, MQTT, ...), but this is the point of it all.

Do you have an example implementation of a WoT client which can automatically talk to any web thing using any protocol based on declarative protocol bindings alone, without requiring any custom code? I don't understand how this is possible.

mkovatsc commented 6 years ago

This actually may not be a conflict, as I'm not all that happy with the "sits on top of"

I saw a bit of a conflict, when you just extended the LDO to also contain operation descriptions. With the separate ODO, this is solved and really aligned.

This "sits on top of" is still a bit in there, as you expect them to work together: you need a link to tell it is about fan and an operation to tell how to power-on. Yet consider this common design:

I sure think there should be a link to http://thing.example.com/fan. There you should get hypermedia controls that tell the client how this fan can be used.

I don't think you would get a link with rel=fan to http://thing.example.com/fan/power from there. The client is already in the context of that fan ("rendering the hypermedia controls from http://thing.example.com/fan in its application state engine"). From this context, it only needs the operation power-on which you can provide with an ODO alone -- or we with a form alone.

You see that there are two resources, http://thing.example.com/fan and http://thing.example.com/fan/power. And they are somehow related, as http://thing.example.com/fan/power implements an operation for http://thing.example.com/fan.

When http://thing.example.com/fan/power does more, there can also be a link from http://thing.example.com/fan to http://thing.example.com/fan/power, which could say that the latter represents the power status of the fan. But this is different from the operation!

Yes, that sounds nice for reading Properties. Unfortunately, the IoT is cruel and in the real world we find cases where not even following a link works to read the Property of a device (e.g., LP-WAN). And it fully breaks down, when it is about writing Properties, and a link is not enough anymore (has been the same on the big Web to send representations to the server). Thus, the TD only uses forms for Properties, not both (alternative could be to have a link plus a form when writable; longer...).

mkovatsc commented 6 years ago

To be clear, when I use the term "concrete protocol binding" what I mean is a specification of a standard that all Web of Things servers and clients implement (e.g. a standard REST API or webthing WebSocket sub-protocol), so that any WoT client can talk to any WoT server in the same way that any browser can render any web page. i.e. no code has to be generated specifically for a particular web thing's API.

I fully understand your concrete protocol binding. It is a concrete, new standard that new implementations would have to follow. At the moment, you are lobbying for the Mozilla specification that you have been working on and ignore the requirements of all the WG participants. It helps nothing with all the existing standards already out there and currently emerging.

Do you have an example implementation of a WoT client which can automatically talk to any web thing using any protocol based on declarative protocol bindings alone, without requiring any custom code? I don't understand how this is possible.

Yes, we successfully tested this with a number of existing products at the PlugFests. I did emphasize that you have to agree on a common base protocol at the transfer layer (HTTP, CoAP, MQTT, etc.). Furthermore, the implementation needs to understand all the vocabulary defined in the TD spec.

With this, we have for instance a working WoT Client that can connect to OCF, IKEA Tradfri, LWM2M, and some custom CoAP servers without altering the code and of course without any if statements for exactly these platforms. With the HTTP binding, it can similarly connect to Philips Hue, Panasonic services, Fujitsu services, Siemens services, the Oracle IoT Cloud Service, and other Web services or for instance the ServerlessNabaztag project, which uses HTTP but definitely no REST. Overall, they all have slightly different APIs, but we can adapt to these differences using the TD vocabulary. (edit: sorry, Hue goes into the HTTP list..)

Our TDs can also provide the metadata to make MQTT interoperable -- something quite fundamental, as the protocol itself does not even have standard metadata fields or any other cross-vendor standard.

You can have a look at the code at https://github.com/eclipse/thingweb.node-wot to ensure there is no custom code for the listed platforms.

benfrancis commented 6 years ago

@mkovatsc wrote:

Let's say http://thing.example.com/fan resolves to the Thing Description:

{
  "name": "My Fan",
  "description": "A web connected fan",
  "properties": {
    "power": {
      "type": "boolean",
      "links": [{
         "href": "http://thing.example.com/fan/power"
       }]
    },
    "speed": {
      "type": "number",
      "links": [{
         "href": "http://thing.example.com/fan/speed"
       }]
    }
  }
}

With a concrete HTTP binding it could be assumed that you can read a property with a GET and write a property with a PUT, e.g.

GET http://thing.example.com/fan/power

200 OK
{
  "power": false
}
PUT http://thing.example.com/fan/power
{
  "power": true
}

200 OK
{
  "power": true
}

If you want a declarative protocol binding maybe it could be something like:

{
  "name": "My Fan",
  "description": "A web connected fan",
  "properties": {
    "power": {
      "type": "boolean",
      "links": [{
         "href": "http://thing.example.com/fan/power",
         "rel": "property",
         "mediaType": "application/json",
         "operations": {
            "writeProperty": "PUT",
            "readProperty": "GET"
          }
       }]
    },
    "speed": {
      "type": "number",
      "links": [{
         "href": "http://thing.example.com/fan/speed",
         "rel": "property",
         "mediaType": "application/json",
         "operations": {
            "writeProperty": "PUT",
            "readProperty": "GET"
          }
       }]
    }
  }
}

Unfortunately, the IoT is cruel and in the real world we find cases where not even following a link works to read the Property of a device (e.g., LP-WAN).

So use a gateway to bridge the LP-WAN (e.g. LoRa or Sigfox) to the Web of Things so that any WoT client can talk to it without the hardware and software requirements of implementing those protocols. For me, abstracting away these differences is the point of the Web of Things.

benfrancis commented 6 years ago

@mkovatsc wrote:

It is a concrete, new standard that new implementations would have to follow.

Which surely is what a Working Group should be defining.

You can have a look at the code at https://github.com/eclipse/thingweb.node-wot to ensure there is no custom code for the listed platforms.

Thank you, I will take a look.

Yes, we successfully tested this with a number of existing products at the PlugFests. I did emphasize that you have to agree on a common base protocol at the transfer layer (HTTP, CoAP, MQTT, etc.).

It is the "etc." that concerns me, because it extends beyond web protocols and Internet protocols to basically anything. I just think that scope is too large to deliver meaningful interoperability.

mkovatsc commented 6 years ago

It is the "etc." that concerns me, because it extends beyond web protocols and Internet protocols to basically anything. I just think that scope is too large to deliver meaningful interoperability.

It just shows the expressive power of the concepts. Nobody is planning to expose WoT Things over Modbus or whatever; this is plain stupid. Yet if you look around, there are many emerging IoT standards that converged on HTTP, CoAP, and MQTT -- and AMQP, but they all add this noise, these small differences within the same protocol that make them incompatible again. This we can solve.

Moreover, there are already a lot of such "etc. Things" out in the world that need to be integrated. Being able to describe them with the same abstraction helps a lot with the integration, onboarding, and management of devices, on gateways and in the cloud. It is much easier to look at a uniform TD of a device than going through thick prose specifications.

With the descriptive power, W3C WoT can overcome both -- similar to the Web, which in its early days also needed many plug-ins. Over time, we also hope that this need for plug-ins/non-default descriptions will go away. But for this, such cross-vendor, cross-ecosystem interoperability first has to be seen, valued, and studied in practice, so that we can figure out meaningful defaults that cover most cases.

handrews commented 6 years ago

@mkovatsc the http://thing.example.com/fan, http://thing.example.com/fan/power, http://thing.example.com/fan/speed example is tremendously helpful, thanks!

I had not considered these use cases for Hyper-Schema, because up to and including the current draft of Hyper-Schema only supported links directly. The schema LDO keywords (hrefSchema, targetSchema, submissionSchema, headerSchema) can be used to generate forms, but they basically describe all possible inputs, rather than inputs for specific operations. So in current Hyper-Schema you would have to have a link to http://thing.example.com/fan/power and come up with some value for rel.

But introducing operations does bring up this use case. Let me think on this a bit- I think I might be able to get LDOs + ODOs to handle this reasonably well.

mjkoster commented 6 years ago

Yes, we successfully tested this with a number of existing products at the PlugFests. I did emphasize that you have to agree on a common base protocol at the transfer layer (HTTP, CoAP, MQTT, etc.). Furthermore, the implementation needs to understand all the vocabulary defined in the TD spec.

I think this has not been a hard problem to solve for a number of reasons:

The vocabulary in the TD specification aligns quite well with the way protocol settings and header options work in the protocol implementations I am familiar with (Python http module, Python/twisted http and CoAP servers, nodejs http, node-coap, ARM mbed CoAP client, mosquitto mqtt client ...).

The number of common protocols is relatively small going forward. The number of options needed is relatively small, and the protocol adapter needs to set them to some defaults anyway. Not all implementations would necessarily support all protocols.

If a concrete API binding is needed for simple clients, a gateway could easily be built using the protocol bindings as an internal adaptation layer. For example, we are having good results with early experiments in generating TDs from the formal descriptions of APIs provided by OCF. This, combined with the default exposed thing protocol binding to HTTP, would provide both a concrete API when needed, and a consistent, reusable, maintainable, protocol adaptation layer. As already mentioned, we have people working on doing this in a number of areas already.

Action (as well as Property and Event) is an interaction pattern of WoT TD, one abstraction level higher. Interaction patterns describe what resources provide: e.g., some resources simply return a document, some return a representation of a collection using links to sub-resources, some start sending events (cf. SSE using GET), some accept documents, some accept specific RPCs. These are all different patterns and WoT tries to narrow these patterns down to 3, a number that comes from prior work, in particular the COMPOSE project, that found that these three are a good sweetspot to be precise, flexible, and narrow.

Here is a big difference between TD and OpenAPI. A TD with semantic annotation explains the interaction, describes the representation, and informs the protocol usage.

"Related concept at the same level" is much closer to how I see it than "broadening of the concept of links".

Nice.

mjkoster commented 6 years ago

@handrews where can I go to get a document that describes your latest thinking about the LDO and ODO? There are some similarities in how we are thinking about handling URI templates that we may want to align more closely. I see a couple of other opportunities also on reviewing your -01 hyperschema draft

handrews commented 6 years ago

@mjkoster hmmm... I should definitely write that up. I took a couple of months off this summer and just started a new job this week so things have been a bit chaotic. The new job will involve work on JSON Hyper-Schema and APIs so once I get ramped up I'll have more time to work on this. I'll file the ODO idea over in the JSON Schema project as soon as I get a chance and then link here- up until now I've just kicked ideas around on the JSON Schema slack workspace.

benfrancis commented 6 years ago

@mjkoster wrote:

The vocabulary in the TD specification aligns quite well with the way protocol settings and header options work in the protocol implementations I am familiar with (Python http module, Python/twisted http and CoAP servers, nodejs http, node-coap, ARM mbed CoAP client, mosquitto mqtt client ...).

OK, so it works for HTTP, CoAP and MQTT (if you add a URI scheme to MQTT). That sounds completely feasible.

The number of common protocols is relatively small going forward. The number of options needed is relatively small, and the protocol adapter needs to set them to some defaults anyway.

This does not match our experience at Mozilla, even just looking at smart home applications.

Not all implementations would necessarily support all protocols.

Imagine a world where websites used 20 different protocols instead of just one and different web browsers supported different subsets of those 20 protocols. Does that not concern you?

If a concrete API binding is needed for simple clients, a gateway could easily be built using the protocol bindings as an internal adaptation layer. For example, we are having good results with early experiments in generating TDs from the formal descriptions of APIs provided by OCF. This, combined with the default exposed thing protocol binding to HTTP, would provide both a concrete API when needed, and a consistent, reusable, maintainable, protocol adaptation layer. As already mentioned, we have people working on doing this in a number of areas already.

This actually sounds exactly like what our gateway software does too. We have pluggable adapters which convert various different IoT and smart home protocols (currently around 20) to a uniform web API using HTTP & WebSockets. This is a very manual process and has so far included Zigbee, Z-Wave and X10 (none of which have URI schemes BTW) and various proprietary smart home protocols over WiFi and Bluetooth (e.g. HomeKit, Broadlink, Eufy, LIFX, Logitech Harmony, Natatmo, Philips Hue, Sonos, TP-Link, Wemo, Yeelight).

It appears the point of contention is which part should be standardised - the adapter layer or the concrete protocol layer. Imagine if rather than standardising on HTTP we had standardised the browser plugin system instead, in order to allow browsers to support 20 different hypertext protocols. I imagine the web of pages would be much more fragmented than it is today.

To be clear, I'm not suggesting ignoring the diverse requirements of different IoT platforms. I'm saying that in order to accommodate those but still enable web-style ad-hoc interoperability where any WoT client can talk to any WoT device, it is essential to standardise on a very small set of web protocols which every WoT client supports, as an abstraction on top of those diverse IoT protocols. This is what defines the Web of Things as distinct from the Internet of Things. Otherwise I can't imagine how we will end up with anything other than an Internet of Things which continues to be extremely fragmented.

I'm wondering if there might be a middle ground here which is less strict that the concrete HTTP protocol binding Mozilla have currently implemented, but flexible enough to adapt to any IoT API built on web protocols (i.e. HTTP and CoAP), e.g. OCF, HomeKit, Philips Hue, IKEA Tradfri etc. and countless existing IoT cloud services?

That could potentially be achieved with a combination of "links" and "operations" as @handrews described. It may not even be that different from the current approach using "forms" and "rel", just using slightly different terminology and constrained to only web protocols (because it's the Web of Things).

What do you think?

handrews commented 6 years ago

@benfrancis

Imagine a world where websites used 20 different protocols instead of just one and different web browsers supported different subsets of those 20 protocols. Does that not concern you?

I think this would shake out naturally in a short period of time, unless there were compelling reasons for people to support all 20 protocols. There's nothing that really forces web browsers to use HTTP, and in fact web browsers have pretty much always supported varying behavior by URI scheme.

In fact, there are plugins today to allow browsers to use CoAP:

https://addons.mozilla.org/en-US/firefox/addon/copper-270430/

which I see is implemented by @mkovatsc, and also that Mozilla broke the protocol plugin interface which... argh. That kind of proves my point, though: browsers (or other systems) that allow protocol extensibility are better as it lowers the barrier to entry for legitimately broadly useful new protocols.

benfrancis commented 6 years ago

@handrews wrote:

I think this would shake out naturally in a short period of time, unless there were compelling reasons for people to support all 20 protocols.

This doesn't appear to have happened to IoT protocols in the decade since the Web of Things community was created. Isn't that why we're all here?

In fact, there are plugins today to allow browsers to use CoAP: https://addons.mozilla.org/en-US/firefox/addon/copper-270430/ which I see is implemented by @mkovatsc, and also that Mozilla broke the protocol plugin interface which... argh.

I'm not quite sure what to make of the irony that the reason this add-on doesn't work any more is that Mozilla implemented the Web Extensions API standard for browser add-ons and it turned out not to be flexible enough for this use case.

That kind of proves my point, though: browsers (or other systems) that allow protocol extensibility are better as it lowers the barrier to entry for legitimately broadly useful new protocols.

I agree that the door should be left open for a successor to HTTP & CoAP which is better suited to IoT use cases. In fact I would encourage it. I don't agree that the "Web of Things" should be limited to trying to describe the existing range of non-web IoT protocols rather than defining an actual standard web protocol for IoT to allow for web-style ad-hoc interoperability. A web of things built on the lessons learned from the web of pages we have today.

I'd genuinely like to hear whether people are interested in the middle ground I tried to describe in https://github.com/w3c/wot-thing-description/issues/179#issuecomment-412064226 .

mkovatsc commented 6 years ago

I'd genuinely like to hear whether people are interested in the middle ground I tried to describe in #179 (comment) .

This is exactly what we have been offering for a long time and have been hoping for constructive feedback.

Hyperschema's LDO and ODO are not far apart and I think the discussion with @handrews is going well to fully understand the subject.

mkovatsc commented 6 years ago

I'm not quite sure what to make of the irony that the reason this add-on doesn't work any more is that Mozilla implemented the Web Extensions API standard for browser add-ons and it turned out not to be flexible enough for this use case.

The problem is that Firefox add-ons cannot implement any custom protocol handlers anymore. Goodbye FireFTP etc. as well.

benfrancis commented 6 years ago

@benfrancis wrote:

I'm wondering if there might be a middle ground here which is less strict that the concrete HTTP protocol binding Mozilla have currently implemented, but flexible enough to adapt to any IoT API built on web protocols (i.e. HTTP and CoAP), e.g. OCF, HomeKit, Philips Hue, IKEA Tradfri etc. and countless existing IoT cloud services? That could potentially be achieved with a combination of "links" and "operations" as @handrews described. It may not even be that different from the current approach using "forms" and "rel", just using slightly different terminology and constrained to only web protocols (because it's the Web of Things).

@mkovatsc wrote:

This is exactly what we have been offering for a long time and have been hoping for constructive feedback.

I don't think it is, because it isn't constrained to web protocols which means that WoT clients would have to implement non-web protocols in order to talk to some WoT devices with what would be considered valid Thing Descriptions. This risks adding unnecessary complexity and preventing web-style ad-hoc interoperability.

But if we put that point aside (which I think is due to a disagreement over the definition of the Web of Things and the best way to enable interoperability), then perhaps there is a way forward regardless.

A Thing Description built on "links", "link relations" and "operations" seems more workable than "forms" and "form relations", especially for properties and events. The constraints you described in https://github.com/w3c/wot-thing-description/issues/151#issuecomment-412145006 requiring URIs and media types will help here.

@handrews I'd really like to see your ODO proposal written down to understand it better, because I think that could help.

We established defaults (GET to read, PUT to write, POST to invoke, ...) to apply to TDs, so that Things following these defaults are much easier to describe. Here we are also happy to get more input, as it is quite hard to set defaults right in an emerging field.

Very happy to provide input there.

mkovatsc commented 6 years ago

A link expresses a relation between the context or an anchor resource and the target resource identified by its URI. When the content of the target resource is required, a link expects the target URI to be dereferencable, i.e., a representation can be fetched using GET. I hope we can all agree on this.

What if the target URI is not a dereferencable? This is the case for resources that only accept POST requests, for instance, to switch the power of a fan or to subscribe to an Event using Webhooks. In this case, what is the relation between the context and the target resource? Does it make sense to have a link to this undereferencable URI?

mkovatsc commented 6 years ago

Assume you find a link to a resource that manages a collection. You follow the link and get a representation of this collection. The representation has links to items in this collection.

Following HATEOAS, the representation should als carry hypermedia controls about how to add an item to the collection. Does it make sense to have self-links only to attach these controls? Note that the client already has a representation of the collection resource as the current context / application state.

handrews commented 6 years ago

@mkovatsc I don't think that the existence of a link implies that the target resource supports HTTP GET or any other analogous operation. A link simply expresses a relationship. The target resource may be write-only or may only be invokable with its own semantics (POST).

POST-only API resources are fairly common, and I've accessed them via links just like any other resource. Choosing a sensible value of rel may be a bit challenging as you noted in your "fan/power" example, but I've always found that to be a solvable problem in practice, and I will make a point to address it when I write up the ODO idea (which I will try to do by early next week).

As for collections, that is a very standard pattern and there are "collection" and "item" link relations for it. If you are on a collection and want to use a "self" link, there are a couple of approaches I have seen. Although the awkwardness of those is one of many motivations for ODOs.

Right now, in Hyper-Schema, I ripped off Mike Admunsen's application/vnd.collection+json approach of associating specific POST behavior with resources that are the target of a "collection" link or the context of an "item" link. It's a bit awkward, though. If you have a collection resource, you should be able to notice that it supports "item" links (even if it is empty, you have the Hyper-Schema and can see that it would have item links), and then you can look at submissionSchema (which describes the POST request format) on the "self" link. As I said, a bit convoluted.

It more or less works, but I'm not thrilled with it and hope to improve the situation with ODOs, while still building on "collection" and "item" links.

mkovatsc commented 6 years ago

@handrews I know, the existence of a link does not even imply the resource exists (we also do some Linked Data in WoT...). Overall, the concept for collections, using media type hints to tell how to modify resource state via representations, etc. is all clear. This is why I brought them up. So far, they only work with a priori knowledge about the controls, for instance by attaching them to the media type or declaring a special relation for the application.

But how to provide hypermedia controls inline with representations in a reusable way? We both look for an answer for this.

To me, the need for a self-link to on top express what operations are possible feels unnecessary. It gets weirder when the operation resource does not even have a representation and the relation is basically irrelevant for the context, as it is always something like operatedby.

Thus, we arrived at this "related concept at the same level": forms that do not need links to exist.

If you think we need links as a basis, and this is also the main requirement for @benfrancis, then it would be good to understand, for what the link is needed -- other than holding the URI and other than "because so far we always only had links" ;)

Looking forward to your ODO description!

ektrah commented 6 years ago

Good discussion. Could we collect the examples (power on a fan, modify a collection) in a GitHub repo somewhere and show how they would be expressed with W3C TD hypermedia controls, Hyper-Schema, CoRAL, etc.?

mjkoster commented 6 years ago

@ektrah good point!

If I were to construct a hypermedia state machine for the three cases of

I believe I could construct one bubble diagram and relabel the bubbles and arcs modulo TD also provides for setting protocol (header) options

Moreover, the hyperschema "submission" and TD "input" are the same concept, added for cases where what the client sends and what it receives are different.

So I guess we are arguing mostly about naming preferences ...

mjkoster commented 6 years ago

@benfrancis When I say converging on a few protocols, I am specifically referring to CoAP, HTTP, and MQTT. I believe that we are mainly focused on standardizing bindings to these protocols. There are also sub-protocols in common use that we would include, for example SSE, which is also commonly used in browsers. And of course websockets, which is only a "web" protocol because it's a back channel for browsers ;-) note: we can also elevate wss: to interoperable web protocol status by adding the metadata in TD...

There has already been, and will be in the near future more of, consolidation from several SDOs into these protocols. Further, the formats are converging on CBOR and JSON. I don't see the divergence hinted at. But I am looking at a trend, not just history. Thus we don't include X-10...

This leaves it to the simple adaptations we are providing in TD, which are data structure and data type (in the DataSchema part) and transfer layer control like PUT vs. POST and header options (provided in the forms part, also approximated by some similar constructs like "operation")

I'm wondering if there might be a middle ground here which is less strict that the concrete HTTP protocol binding Mozilla have currently implemented, but flexible enough to adapt to any IoT API built on web protocols (i.e. HTTP and CoAP), e.g. OCF, HomeKit, Philips Hue, IKEA Tradfri etc. and countless existing IoT cloud services?

Yes, I think that is the spirit of it

handrews commented 6 years ago

@mjkoster

Moreover, the hyperschema "submission" and TD "input" are the same concept, added for cases where what the client sends and what it receives are different.

Almost but not quite. Hyper-Schema does not think in terms of responses. targetSchema is expected to describe the response to an HTTP GET, but that is an indirect result of targetSchema describing the representation of the target resource, plus HTTP defining the GET response as containing a representation of the target resource.

HTTP says little about the response to PUT (it could be an updated representation, but that is indicated via Content-Location being the same as the effective request URI- it is not inherent in the semantics of PUT). So in Hyper-Schema you would use targetSchema for the PUT request format, as the PUT request is a representation of the target resource. You would not submissionSchema for PUT, even though the response may not conform to the same format as the request. submissionSchema and submissionMediaType are for non-representation data as used in requests, which in HTTP happens to mean POST.

Note that PATCH is currently awkward in Hyper-Schema, as it is deterministically related to the representation data, but is not necessarily directly compatible with that format, e.g. JSON Patch's list of operations format.

handrews commented 6 years ago

Hyper-Schema assumes that each response will link to its own schema, as it is primarily intended for runtime use. There are definitely some interesting questions about how ODOs would treat responses. There are also ideas around an additional API description vocabulary, for design-time information, which would group hyper-schemas into a more OpenAPI-ish document. Although we may prefer to figure out a way to integrate OpenAPI and Hyper-Schema rather than re-inventing that wheel, now that OpenAPI is looking to support JSON Schema more directly instead of only allowing its customized variant.

benfrancis commented 6 years ago

@mjkoster wrote:

When I say converging on a few protocols, I am specifically referring to CoAP, HTTP, and MQTT. I believe that we are mainly focused on standardizing bindings to these protocols.

That sounds positive*. Is that now a shared assumption of everyone in the Working Group? I have previously heard suggestions that a valid use case of a Thing Description would be to describe non-IP protocol bindings like Zigbee or Bluetooth directly, which would be a much larger scope.

I don't see the divergence hinted at. But I am looking at a trend, not just history. Thus we don't include X-10...

I definitely see convergence around HTTP+WS (and MQTT) when it comes to IoT cloud services. But if the scope of a Thing Description extends beyond the Internet to non-IP protocols then that's a very different picture. We have come across many commercial smart home devices using Zigbee, Z-Wave, Bluetooth, Thread+Weave, HomeKit and various proprietary protocols over WiFi. If those are considered outside the scope of what a Thing Description should describe (but in scope for what a WoT gateway might bridge to WoT) then that's great.

note: we can also elevate wss: to interoperable web protocol status by adding the metadata in TD...

I'm interested in how well this works without a concrete sub-protocol specification. How is the message format defined? The WebSocket examples I can see from the last PlugFest just set a mediaType of application/json, don't specify a sub-protocol and leave the message format undefined. How does a client know how to format a message to set a property, request an action, subscribe to an event, and be notified of property changes, action status or events?

You could argue the same for an HTTP binding, where specifying application/json media type alone (even if also providing protocol methods and headers) is not enough information to formulate a request. Is the expectation that the Thing Description declaratively defines all that JSON formatting (e.g. to read a property, write a property, request an action, get an action status, cancel an action request, receive a log of events), or perhaps that new MIME types will be created?

Regarding form relations vs. link relations, I think a good example is that some of the SmartThings examples from the last PlugFest use an action to set a property with the action having the same form URL as the property it is setting, but with a different method. In that case would it not make more sense to have a single property link relation with multiple operations (i.e. a GET and a PUT) instead?

* I still maintain that MQTT being considered a "web protocol" is even more of a stretch than with WebSockets.

handrews commented 6 years ago

@mkovatsc @mjkoster @benfrancis I am still working on the ODO write-up. I just started a new job and that's taking most of my time at the moment. They will support me spending some time on JSON Schema on company time, but first I have to get up to speed on the product and all that.

To add a bit more info on my current thinking, the most common case that I encounter involves multiple operations on the same templated link. This is why I am inclined to nest the ODOs inside the related LDO. As someone noted, fully leveraging the power of URI Templates can get a bit complicated, so rel, href, anchor, templatePointers, and templateRequired are often re-used across a set of operations.

Regardless of whether a link relation is required, a stand-alone ODO would still need an href and the supporting template resolution keywords. So I suspect that, at least in my first formal proposal, I'll keep links, and add ops within the LDO:

Off the top of my head, a POST-only resource could look like this.

{
    "title": "A fan with links to invocable controls",
    "type": "object",
    "properties": {"id": {"type": "integer", "minimum": 1}},
    "links": [{
        "rel": "invokable",
        "href": "fans/{id}/power",
        "targetSchema": false,
        "ops": [{
            "intent": "power-on",
            "requestSchema": {...},
            "method": "POST"
        }]
    }]
}

I left out responses on purpose as I'm still wrestling with what I want to do there. I rather like Hyper-Schema's philosophy that responses should link their own schemas, as it is very flexible, but there are competing requirements from various people.

handrews commented 6 years ago

I edited the above example to have "targetSchema": false indicating that the target resource does not have a representation, so there's no point in GET-ing it.

benfrancis commented 6 years ago

To add a bit more info on my current thinking, the most common case that I encounter involves multiple operations on the same templated link. This is why I am inclined to nest the ODOs inside the related LDO.

Sounds good.

I'm trying to imagine how this might translate to a Thing Description. Here's a slightly more complex example which covers properties, actions and events. I've tried to re-use elements from your example and the current Editor's Draft of the Thing Description specification.

{
  "name": "Web Fan",
  "description": "A web connected fan",
  "properties": {
    "power": {
      "type": "boolean",
      "links": [{
        "rel": "property",
        "href": "/fans/1/properties/power",
        "operations": [
          {
            "intent": "readProperty",
            "http:methodName": "GET", 
          },
          {
            "intent": "writeProperty",
            "http:methodName": "PUT",
          }
        ]
      }]
    }
  },
  "actions": {
    "turn": {
      "input": {
        "type": "number",
        "unit": "degree",
      },
      "links": [{
        "href": "/fans/1/actions/turn",
        "operations": [{
          "intent": "invokeAction",
          "http:methodName": "POST"
        }]
      }]
    }
  },
  "events": {
    "overheated": {
      "links": [{
        "rel": "event",
        "href": "/fans/1/events/overheated",
        "operations": [{
          "intent": "subscribeEvent",
          "http:methodName": "GET",
          "subProtocol": "LongPoll",
        }]
      }]
    }
  }
}

This approach of using links + operations seems to make a lot more sense than the current forms approach, particularly for properties and events. I'm not completely opposed to using forms for actions as that at least makes some sense.

You could add operations to links in the top level links member to do things like get all properties at once, set multiple properties at once or get a list of all action requests.

Some things which are not clear to me:

handrews commented 6 years ago

@benfrancis (or anyone), why is there an http:methodName instead of just method or methodName? I can already tell that the protocol is HTTP from the resolved URI, and I don't want to have to look that up to figure out what keyword to look at to find the protocol method.

I haven't followed the subProtocol stuff so I'm going to ignore that for the moment (but I do need to come back to that and understand it).

How to describe the data payload format of all the requests and responses (would this be specified using MIME types? or is the intention to go full blown OpenAPI and provide data schemas and headers for all the payloads with a list of possible responses?)

My current thinking is this:

If the dual-level LDO/ODO thing feels a bit complex, it's partially because I'm still working through this, but partially because there are different use cases being addressed simultaneously.

What I'm looking at right now is a proposed API that can, in "advanced mode", just GET resources, display the whole thing to a human to edit however they want, and PUT it back. All you need for that is an LDO with targetSchema (and maybe targetMediaType).

It also has a guided workflow mode, which is much closer to what WoT needs, and this is where ODOs are used: they specify, as much as possible, everything needed for automated clients to follow the workflow. The clients may be moving information from resource to resource, rather than ever asking humans for input (although they might do that, too).

So ODOs basically highlight "interesting" specific requests. Just because a request is not covered by an ODO does not mean that it's necessarily invalid. If you only show a PATCH operation that touches one of three fields, it MAY still be possible to use PATCH on the other two, it's just that the hyper-schema does not ascribe any particular meaning to doing so.


You'll notice I haven't mentioned responses: that's because I haven't figured out what I want there at all. From a runtime perspective, I like Hyper-Schema's current approach, which is that responses link their own schemas, and we don't try to lay out all the possibilities in advance. That's great for runtime, because (particularly when designing a thorough set of error handling workflows), there may be many possible responses and identifying all of those in advance may be difficult or impossible.

On the other hand, if you want to generate human-oriented documentation, it is useful to describe possible/likely responses. One thing I am considering is leaving responses out of the Hyper-Schema vocabulary, but putting the in the (proposed but as yet very nebulous) API Documentation vocabulary.

My question is: what would a run-time user-agent do with response schemas on the operations? Would it examine the schemas and make different decisions about what operations to attempt based on expected responses? I can't see a use case there. At run-time, I only care what response actually happens, not what other responses could happen.

Identifying and describing responses feels like a design-time / human-oriented-documentation thing. But I'm open to suggestion.

How to define how to get the status of an individual long-running action request

I would assume you'd get (in HTTP) a 202 and follow the Location link. Those responses would link their own schemas, so that is how you would know how to use the response from doing a GET on the Location URI for the polling resource.

How to define how to cancel a long-running action request

I would define a "self" link with a DELETE operation with some sort of "cancel" intent on the polling resource's representation. This maps the "cancel" application-level operation onto the DELETE protocol-level operation a.k.a. method. For a more RPC-ish system there could be a separate, POST-only URI (e.g. /jobs/1234/cancel), which would also have the "cancel" intent. This shows a benefit of operations doing the protocol mapping: a system can absorb at least some differences in CRUD vs RPC-ish APIs.

I recently described Hyper-Schema operations as "defining somewhat rpc-ish APIs in terms of REST".

How to describe a WebSocket binding with one socket per thing (with multiple message types) or even multiple things per socket

I need to understand these use cases better. I've not paid enough attention to WebSockets because we were thinking of Hyper-Schema in draft-06 and draft-07 as a relatively purist REST system. But with operations, it's possible to allow a lot more flexibility, allowing Hyper-Schema to address both purist REST and other paradigms. So now I need to think about more use cases :-)

handrews commented 6 years ago

Note that requestSchema and requestMediaType at the ODO level remove the need for submissionSchema and submissionMediaType at the LDO level. My feeling now is that POST is always an operation, so you would always have an ODO for POST.

This means that this draft-07 LDO:

{
    "rel": "collection",
    "href": "things",
    "submissionSchema": {"$ref": "my-item-schema"}
}

becomes the following draft-08-with-operations LDO:

{
    "rel": "collection",
    "href": "things",
    "ops": [
        {
            "intent": "create",
            "method": "POST",
            "requestSchema": {"$ref": "my-item-schema"}
        }
    ]
}

Similarly, hrefSchema is part of an ODO- it could be for GET or for other methods. So consider the following draft-07 LDO, on an instance that has a "filters" property (from which the URI Template variable "filters*" is filled out):

{
    "rel": "collection",
    "href": "things{?page,filters*"},
    "templateRequired": ["filters"],
    "hrefSchema": {
        "type": "object",
        "properties": {
            "page": {"type": "integer", "minimum": 1},
            "filters": false
        }
    },
    "targetHints": {
        "allow": ["GET", "PATCH"],
        "accept-patch": ["application/merge-patch+json"]
    }
}

This LDO should be interpreted as requiring the filters from the current instance to be used (the filters URI Template variable is required by templateRequired, but the client is not allowed to override it, as shown by setting it to false in hrefSchema), but allowing the client to jump to an arbitrary page in that filtered collection ("page" is described by hrefSchema as a positive integer).

The LDO also hints that PATCH is supported, and with what patch media type, but cannot express what patches might be interesting or why.

In draft-08-with-operations, this could become:

{
    "rel": "collection",
    "href": "things{?page,filters*}",
    "templateRequired": ["filters"],
    "ops": [
        {
            "intent": "jump-to-page",
            "method": "GET",
            "hrefSchema": {
                "type": "object",
                "properties": {
                    "page": {"type": "integer", "minimum": 1},
                    "filters": false
                }
            }
        },
        {
            "intent": "clear",
            "method": "PATCH",
            "hrefSchema": false,
            "requestMediaType": "application/merge-patch+json",
            "requestSchema": {
                "const": {"elements": []}
            }
        }
    ]
}

Now we have separated the operation-ish part of filling out the URI as part of the operations. In all operations, you are working with the filtered collection ("filter*" still must be filled out from the instance). The GET operation allows the client to choose a page. And the PATCH operation specifically empties the "elements" array (implementing a "clear collection" intent).

For the PATCH we set hrefSchema entirely to false, as using PATCH to clear a specific page of the collection seemed weird. But don't read too much into that, I'm making up examples on the fly here and I tend to be bad at that :-/

benfrancis commented 6 years ago

@handrews wrote:

why is there an http:methodName instead of just method or methodName?

I don't know. It appears to be defined in the Protocol Binding Templates specification. I think it's an RDF namespace thing so that HTTP vocabulary can be re-used from the HTTP Vocabulary in RDF 1.0 specification, though I'm not sure where the CoAP and MQTT vocabulary comes from.

I would personally prefer the alternatives you mentioned to avoid the complications of the RDF namespaces, but I guess it's more complicated than just defining a method. With declarative protocol bindings there are other things to worry about like headers and other options, especially if including non-web protocols like MQTT.

I haven't followed the subProtocol stuff so I'm going to ignore that for the moment (but I do need to come back to that and understand it).

As I understand it it's a little bit loosely defined at the moment, currently the possible values are LongPoll and EventSource. These aren't "protocols" so much as mechanisms for adding pub/sub capabilities to HTTP. I am a bit concerned about potential confusion between the subProtocol vocabulary and WebSocket sub-protocols which are negotiated during the WebSocket handshake. But maybe they can be declaratively hinted at using subProtocol too.

My question is: what would a run-time user-agent do with response schemas on the operations? Would it examine the schemas and make different decisions about what operations to attempt based on expected responses? I can't see a use case there.

I actually have the same question about requests. My biggest concern with the amount of flexibility introduced by declarative protocol bindings is how the complexity can spiral out of control. As I understand it a major use case of OpenAPI is to automatically generate code which can act as a custom client for a single REST API. I wonder what a general purpose WoT client (run-time user agent) could realistically do with such open ended Thing Descriptions. Would it have to generate code to deal with every new web thing it comes across?

I would assume you'd get (in HTTP) a 202 and follow the Location link. Those responses would link their own schemas, so that is how you would know how to use the response from doing a GET on the Location URI for the polling resource.

This is a good example of the potential complexity. What you describe is basically what our concrete protocol binding for requesting actions using HTTP does. An action is requested with a POST request which receives a 201 Created response with a URL for the new resource which identifies an individual action request. But can you really make such an assumption with declarative protocol bindings? They provide the flexibility that an action request could use any method, not just POST, and may not actually create a new resource when an action is requested.

I would define a "self" link with a DELETE operation with some sort of "cancel" intent on the polling resource's representation. This maps the "cancel" application-level operation onto the DELETE protocol-level operation a.k.a. method.

Again, this is essentially what our concrete protocol binding does, uses a DELETE to cancel an action request. But in a declarative protocol binding how would you know from the Thing Description that a DELETE is possible on a resource whose URL doesn't even exist in the Thing Description (because it may not have been created yet)? Would this be part of the response schema that you described?

I need to understand these use cases better. I've not paid enough attention to WebSockets because we were thinking of Hyper-Schema in draft-06 and draft-07 as a relatively purist REST system. But with operations, it's possible to allow a lot more flexibility, allowing Hyper-Schema to address both purist REST and other paradigms. So now I need to think about more use cases :-)

I suspect this may be where this model may break down, because of the non-webby nature of WebSockets! Like with MQTT, a WebSocket sub-protocol wouldn't necessarily break the web thing down into separate resources for properties, actions and events. See our WebThing WebSocket API for an example.

It is possible to design an API where you open a WebSocket on individual property and event URLs to "observe" them, and upgrade an HTTP GET request into a WebSocket in the same way that a CoAP GET request is upgraded to a observe. That's how we originally started. But that isn't a very efficient way to use WebSockets.

benfrancis commented 6 years ago

@benfrancis wrote:

I suspect this may be where this model may break down, because of the non-webby nature of WebSockets! Like with MQTT, a WebSocket sub-protocol wouldn't necessarily break the web thing down into separate resources for properties, actions and events. See our WebThing WebSocket API for an example.

FWIW I had a go at trying to create a declarative protocol binding for the same device, using the links + operations model but using the webthing WebSocket sub-protocol rather than HTTP. This was the best I could manage:

{
  "name": "Web Fan",
  "description": "A web connected fan",
  "properties": {
    "power": {
      "type": "boolean",
      "observable": true,
      "links": [{
        "rel": "property",
        "href": "wss://example.com/fans/1",
        "operations": [
          {
            "intent": "observeproperty",
            "subProtocol": "webthing",
            "messageType": "propertyStatus",
          },
          {
            "intent": "writeproperty",
            "subProtocol": "webthing",
            "messageType": "setProperty",
          }
        ]
      }]
    }
  },
  "actions": {
    "turn": {
      "observable": true,
      "input": {
        "type": "number",
        "unit": "degree",
      },
      "links": [{
        "href": "wss://example.com/fans/1",
        "operations": [{
          "intent": "invokeaction",
          "subProtocol": "webthing",
          "messageType": "requestAction",
        }]
      }]
    }
  },
  "events": {
    "overheated": {
      "observable": true,
      "links": [{
        "rel": "event",
        "href": "wss://example.com/fans/1",
        "operations": [{
          "intent": "subscribeevent",
          "subProtocol": "webthing",
          "messageType": "addEventSubscription",
        }]
      }]
    }
  }
}

Note that in this example the properties, actions and events all share the same URL. This is because in the webthing WebSocket sub-protocol we defined, all the messages for a thing are exchanged over a single WebSocket connection so that clients don't need to keep multiple TCP connections open all the time to talk to a single device. The device just has a single wss:// URL, there are no separate resources for properties, actions and events.

What this binding lacks is any description of the format of the messages themselves. It also can't distinguish between messages which are sent from client to server and messages sent from server to client (WebSockets do not follow the request/response pattern assumed in hypermedia).

To me a better approach for this use case would actually be to provide a single WebSocket URL for the device in the top level links member as an "alternate" link relation. E.g:

{
  "name": "Web Fan",
  "description": "A web connected fan",
  "properties": {
    "power": {
      "type": "boolean",
      "observable": true,
    }
  },
  "actions": {
    "turn": {
      "observable": true,
      "input": {
        "type": "number",
        "unit": "degree",
      }
    }
  },
  "events": {
    "overheated": {
      "observable": true,
      }
    }
  },
  "links": [{
    "rel": "alternate",
    "href": "wss://example.com/fans/1",
    "subProtocol": "webthing"
  }]
}

Then separately define a concrete specification of that sub-protocol. This is what we currently do in our implementation.

You could argue this same approach could be used for MQTT, although MQTT does not have a URI scheme* and therefore can not be referenced using a web link in the same way a WebSocket endpoint can.

I've not seen any examples from PlugFests which address these issues. Whenever WebSockets are used a media type of application/json is often specified but there is no definition of any message formats or sub-protocols which a client would need to actually use the WebSocket. The examples also tend to have a separate WebSocket for every property and event, which doesn't really scale. Whenever MQTT is used, a URI using a non-standard MQTT URI scheme is provided and again there isn't really sufficient information for a client to automatically figure out how to communicate with the web thing.

Having followed this through, I'm still struggling to understand how declarative protocol bindings would work in practice in a linkable and client-agnostic Web of Things.

* OASIS considered it, but then closed the issue without action over a year ago

handrews commented 6 years ago

Having followed this through, I'm still struggling to understand how declarative protocol bindings would work in practice in a linkable and client-agnostic Web of Things.

@benfrancis to be clear, this is a general doubt that you have, and not one that's specific to either the current proposals or a JSON Hyper-Schema-based alternative?

I spent some time over the long weekend finally catching up on the WoT proposals and alternate proposals, and the WebSockets and CoAP RFCs and I think I have some further thoughts on this. Some things in your example can be addressed more directly with Hyper-Schema, e.g. instead of a special subprotocol field for websockets, just using

          "headerSchema": {
            "type": "object",
            "required": ["sec-websocket-protocol"],
            "properties": {
              "sec-websocket-protocol": {"const": "webthing"}
            }   
          } 

along with rules for constructing requests that basically say something like "if there is a required property with a const value, then clients SHOULD automatically supply that value for that property", with the properties in this case being HTTP headers.

The examples also tend to have a separate WebSocket for every property and event, which doesn't really scale.

My assumption would be that a client would decide how many websockets to open for a given URI. And isn't that what we are doing? You also say:

This is because in the webthing WebSocket sub-protocol we defined, all the messages for a thing are exchanged over a single WebSocket connection so that clients don't need to keep multiple TCP connections open all the time to talk to a single device. The device just has a single wss:// URL, there are no separate resources for properties, actions and events.

so I'm a little confused. It sounds like connections are re-used?

It is possible to design an API where you open a WebSocket on individual property and event URLs to "observe" them, and upgrade an HTTP GET request into a WebSocket in the same way that a CoAP GET request is upgraded to a observe. That's how we originally started. But that isn't a very efficient way to use WebSockets.

Agreed. I would prefer to leverage the defined "webthing" sub-protocol, which I apparently need to learn more about.

Ultimately, this gets down to a key question: What is the Uniform Interface that we wish to implement? In most discussions around REST, the Uniform Interface in question is assumed to be HTTP's GET/PUT/DELETE/POST and maybe also PATCH. But PUBLISH and SUBSCRIBE seem to be first-class citizens of the modern web. What is the right way to think about that with respect to REST's uniform interface constraint? Apologies if this has already been debated elsewhere at length, it's a new thought for me with respect to Hyper-Schema, and something that is being motivated by more than just this project.


I actually have the same question about requests. My biggest concern with the amount of flexibility introduced by declarative protocol bindings is how the complexity can spiral out of control. As I understand it a major use case of OpenAPI is to automatically generate code which can act as a custom client for a single REST API. I wonder what a general purpose WoT client (run-time user agent) could realistically do with such open ended Thing Descriptions. Would it have to generate code to deal with every new web thing it comes across?

I'm not a big fan of code generation for REST APIs, as it is nearly always inherently fragile (especially in strongly typed languages). I do recognize that it is sometimes necessary for a variety of reasons including performance, but I prefer a more dynamic runtime approach for flexibility and robustness in the face of API evolution.

My personal intention with JSON Hyper-Schema is to avoid code generation, and work with runtime data structures instead. Although I also intend for it to be usable for code generation for those who wish to do so, as working against that intention seems like pointless effort.

draggett commented 6 years ago

Where is the specification for events over the LongPoll subprotocol for HTTP?
EventSource is well defined and supported by many Web browsers using a very simple API, e.g:

let eventSrc = new EventSource(eventURI); eventSrc.onMessage = function(e) { console.log("event: " + e.data); };

See: https://developer.mozilla.org/en-US/docs/Web/API/EventSource

Note that you still need to call JSON.parse(), but that is pretty trivial.

Referencing a subprotocol in our specs without referencing its normative specification seems wrong.

I've worked out by experiment how to listen for events from ThingWeb, and note that it involves a fresh HTTP request for each event, unlike Server-Sent Events, which uses a single HTTP request, and sends each event as a chunk using HTTP's chunked encoding. This is more efficient and similar to the chunking mechanism used in WebSockets.