Allow more operations in a top level form

w3c / wot-thing-description

Web of Things (WoT) Thing Description

http://w3c.github.io/wot-thing-description/

Other

131 stars 63 forks source link

Allow more operations in a top level form #1070

Open benfrancis opened 3 years ago

benfrancis commented 3 years ago

In https://github.com/WebThingsIO/gateway/issues/2806 we are trying to generate W3C compliant Thing Descriptions for devices which provide a single top level WebSocket endpoint in their Thing Description which can be used for a broad set of operations in the Web Thing WebSocket API (being standardised via the Web Thing Protocol Community Group). That WebSocket sub-protocol supports various message types including:

setProperty
requestAction
addEventSubscription
propertStatus
actionStatus
event

In the standardised version I'd ideally like to try and match as many message type names as possible to operation types in the W3C specification, e.g.:

writeproperty
observeproperty
unobserveproperty
invokeaction
subscribeevent
unsubscribeevent
writeallproperties
writemultipleproperties
observeallproperties
unobserveallproperties

For WebSockets it doesn't make sense to have a form per interaction affordance because that means keeping a potentially very large number of TCP sockets open for each device, which is not practical. This is why we currently provide it as link at the top level of the Thing Description:

"links": [
  {
    "rel": "alternate",
    "href": "wss://mywebthingserver.com/things/lamp"
  }
]

In order to make this W3C compliant, we'd ideally like to make this a form like:

"forms": [
  {
    "op": ["writeproperty", "invokeaction", "subscribeevent"],
    "href": "wss://mywebthingserver.com/things/lamp",
    "subprotocol": "webthing"
  }
]

But the current TD specification says:

When the forms Array of a Thing instance contains Form instances, the string values assigned to the name op, either directly or within an Array, MUST be one of the following operation types: readallproperties, writeallproperties, readmultipleproperties, or writemultipleproperties.

This prevents the above form from being compliant with the specification.

I'd therefore like to propose this restriction on the operations a top level form can provide is lifted, to allow for a single WebSocket endpoint which provides all operation types.

egekorkan commented 3 years ago

I also want to see this kind of connection details in top level since that would make MQTT endpoints easier to describe as well. Similar to WS, if one has an MQTT href in each form, it should not mean to establish a connection with the broker for each operation but having a persistent one. There is an old issue about this (https://github.com/w3c/wot-binding-templates/issues/14 which became https://github.com/w3c/wot-thing-description/issues/878).

That form can also describe the required security for the initial connection. However, I think this should not be mixed with the op keyword, since with that form one does not perform any WoT operations, it is a connection requirement. How about something like this:

"@context":"...",
"forms": [
  {
    // "op":"initialization", // I am not super happy with this either since it is still 
    // not an operation comparable to other operations 
    "href": "wss://mywebthingserver.com/things/lamp",
    "subprotocol": "webthing"
  }
],
"properties": {
        "status": {
            "type": "string",
            "forms": [{
              "href": "?", // not sure what this should be 
              "op":"readproperty",
              "subprotocol": "webthing"
            }]
        }
    },

As seen in a comment above, in WS based Things that use a single endpoint (like above), what does the href in each interaction's form mean?

relu91 commented 3 years ago

For WebSockets it doesn't make sense to have a form per interaction affordance because that means keeping a potentially very large number of TCP sockets open for each device, which is not practical. This is why we currently provide it as link at the top level of the Thing Description:

I agree we should really avoid scenarios with a lot of open connections, it is just a waste of resources. However, it seems that stating the WebSocket endpoint at the root level would be semantically equal to replicate it to each form. As I understood, the consumer would open the connection only when it needs it, right? or it keep one open the whole time? @benfrancis can you expand this point please?

In general, I am okay to reduce redundancy in the TD but if I am right, having a WebSocket enpoint replicated in each affordace does not really imply that a client would open one connection per interaction. It can be (and should be) smart enough to pool the connections so that if one it's already open it would reuse it. So it appears that the issue is more about having compact TDs than really improving resource management.

About the @egekorkan use case, I think it's kinda different but it advocates also to the point that we need to allow other operations type in root forms. However, with MQTT, it's more like an initialization handshake rather than connection pooling. Right?

As seen in a comment above, in WS based Things that use a single endpoint (like above), what does the href in each interaction's form mean?

If what I understood holds WebThings would not have an ws href in each affordance but only HTTP links that describe alternatives ways to interact with the WebThing.

EDIT:

@benfrancis can you expand this point, please?

I read the description of the current Web Thing Websocket API and it seems that the real goal is to have a real-time alternative protocol to be used as a communication means with the WebThing:

The Web Thing WebSocket API complements the REST API to provide a realtime mechanism to make multiple requests and be notified of events as soon as they happen, by keeping a WebSocket [WEBSOCKETS-PROTOCOL] open on the Web Thing. The "webthing" WebSocket subprotocol defined here has a simple set of message types and a JSON response format consistent with the Web Thing REST API.

IMHO this would change completely the interaction model with a WebThing and it will have a huge impact on the current Scripting API.

relu91 commented 3 years ago

btw it seems that currently, the only operation we cannot map is actionStatus which resonates with this long-standing issue https://github.com/w3c/wot-thing-description/issues/899.

egekorkan commented 3 years ago

In general, I am okay to reduce redundancy in the TD but if I am right, having a WebSocket enpoint replicated in each affordace does not really imply that a client would open one connection per interaction. It can be (and should be) smart enough to pool the connections so that if one it's already open it would reuse it. So it appears that the issue is more about having compact TDs than really improving resource management.

So regarding this point, it can be both. A consumer would pick an interaction, choose an operation it wants to execute, find the form that has the corresponding operation and execute that operation following that form's requirements (protocol, security, content type etc.). So unless the protocol itself requires to keep an open connection at all times, reconnection is possible.

I would say that we can describe some kind of dependency on form level, i.e. "this must be done before executing this form".

In the end, my goal was neither compact TDs nor resource management but describing how something should be properly done. If we obtain more compact TDs and this proper way of doing something is also using less resources, then we won in all fronts :)

relu91 commented 3 years ago

IMHO this would change completely the interaction model with a WebThing and it will have a huge impact on the current Scripting API.

Just to expand a little more on this point. Currently, a ConsumedThing is stateless. What has state is an affordance interaction, but we can do parallel interactions so that it is possible to optimize and open a single network connection (in principle).

If we are going to support the Web Thing WebSocket API, a ConsumeThing became stateful, in the sense that we have to keep track of possible open connections with the remote server. Notice, that a ConsumeThing is an architectural concept so it's valid for every implementation (not only the Scripting API). At first, it seems really bad, but now that I thinking about it we might get away with it just saying that ConsumeThing can be started and _stoped. Therefore, introducing a lifecycle for ConsumedThings might solve the issue.

I know that the issue itself is not really about the Web Thing WebSocket API but rather it is simply asking to allow other op at forms in the root level. But we need to specify the semantics of those operations otherwise interpret those forms became implementation-specific -> fragmentation.

relu91 commented 3 years ago

So regarding this point, it can be both. A consumer would pick an interaction, choose an operation it wants to execute, find the form that has the corresponding operation and execute that operation following that form's requirements (protocol, security, content type etc.). So unless the protocol itself requires to keep an open connection at all times, reconnection is possible.

Yes, he can actually choose to open a connection for each affordance. This comment was pointed to @benfrancis's fear that having a ws form in each affordance form would always mean have 1000 open connections. This is not the case as explained there.

I would say that we can describe some kind of dependency on form level, i.e. "this must be done before executing this form".

Thanks for clarifying a little bit of the question. Now I see that in reality, the two use cases (MQTT and WebThings WS) have more similarities than I thought. So you would like to describe "How to connect" at root level than in each affordance say "which message should I send to perform an operation". Correct?

It would solve well the MQTT problem, not sure it would solve also WebThings WS cause the protocol explicitly say that it must be open the whole time... In other words, a runtime shouldn't automatically close the socket after it performed a bunch of operations requested by an application.

hspaay commented 3 years ago

Just my 2c's It is better to separate the operations from the possible transports. I am working on a Hub implementation based on MQTT and Websockets and found that for intermediaries the operation remains valid but the transport changes. It is also possible that the intermediary supports multiple transports for the operation. Consumers pass the operation via the transport of the hub/intermediary. How would describe this situation in the TD? Can a Hub rewrite the TD and replace the transport?

egekorkan commented 3 years ago

Thanks for clarifying a little bit of the question. Now I see that in reality, the two use cases (MQTT and WebThings WS) have more similarities than I thought. So you would like to describe "How to connect" at root level than in each affordance say "which message should I send to perform an operation". Correct?

Yes exactly!

It would solve well the MQTT problem, not sure it would solve also WebThings WS cause the protocol explicitly say that it must be open the whole time...

It is also the case for MQTT, the connection to the broker is always kept. Thus, I think there is a lot of similarity.

How would describe this situation in the TD? Can a Hub rewrite the TD and replace the transport?

I would say yes. If we isolate the initial connection in the Thing level, then only this objects needs to be changed by the hub. Otherwise, it has to look into each form and do the necessary changes. For the TDs that TUM provides in plugfests, they are behind an HTTPS proxy that also changes the TDs to replace HTTP with HTTPS and also the security credentials. The fact that this transformation will happen or that it has happened does not need to be in the TD. If it can help you @hspaay , we have the following implementation that has this transport replacement functionality (and more) : https://github.com/tum-esi/shadow-thing

benfrancis commented 3 years ago

@relu91 wrote:

having a WebSocket enpoint replicated in each affordace does not really imply that a client would open one connection per interaction. It can be (and should be) smart enough to pool the connections so that if one it's already open it would reuse it.

The problem is that the specification doesn't actually say anything about whether connections should be shared if the same URL is provided in multiple forms, so we can't assume that consumers will be smart enough to do this.

Defining a single ws:// endpoint in a top level form makes it explicit that only one connection is needed per device.

If what I understood holds WebThings would not have an ws href in each affordance but only HTTP links that describe alternatives ways to interact with the WebThing.

That's correct. In the Web Thing API there are individual http:// endpoints for each interaction affordance (because each operation requires a separate connection) but a single ws:// endpoint for the whole device defined at the top level (because only one connection is needed for all operations).

IMHO this would change completely the interaction model with a WebThing and it will have a huge impact on the current Scripting API.

I don't think it does, interaction affordances are still defined individually and have individual http:// endpoints in their own forms, there's just an additional top level form for the ws:// endpoint which can be shared between all interaction affordances due to the nature of the WebSocket protocol.

I'm afraid I'm not particularly worried about trying to maintain compatibility with the Scripting API as it's a non-normative specification (which doesn't even need to exist in my opinion).

If we are going to support the Web Thing WebSocket API, a ConsumeThing became stateful, in the sense that we have to keep track of possible open connections with the remote server.

I don't understand. Why does it make a difference whether the form is defined at the top level or duplicated in every interaction affordance? All state still belongs to individual interaction affordances, they just share the same TCP connection to communicate that state.

So you would like to describe "How to connect" at root level than in each affordance say "which message should I send to perform an operation".

As far as I know there is no mechanism for describing individual WebSocket messages in a Thing Description. We therefore rely entirely on an out of band specification for a sub-protocol called "webthing" which is negotiated with a protocol handshake with the server when a WebSocket connection is opened (but which could also be referenced from the subprotocol member of a form).

relu91 commented 3 years ago

I had a couple of "offline" discussions about this topic with @sebastiankb @egekorkan, and @danielpeintner. In this comment, I'll try to summarize different points that come up (forgive me, it will be a long post)

read/write/observe at the root level

We find this solution easy to achieve in the short term but possibly confusing. In particular, the fact that a client should reuse the connection is not easily understandable by the fact that the form is on the root level. For example, another interpretation of the same form could be that it should be used as it is in each affordance. However, we could rely on the subprotocol field to understand this behavior. Another confusing point is how a client should interpret the same form but with HTTP protocol instead of WebSocket. Should it use the keepalive option? or it falls back to the previous interpretation (i.e., copy and past that form to every interaction). Finally, another oddity is that we operations like writeproperty and readproperty refer to a single affordance (they kinda have arity 1) but are used at the root level where other arity N operations are used.

Use base field

We have already some ongoing discussion about extending the base field to express multiple protocols (see https://github.com/w3c/wot-thing-description/issues/803). It could used also as a place where to introduce this websocket endpoint. However, it would be again a stretch of its semantic. Couse the base field is intended more as a convenient place where to put common paths. It will not convey to the client that he should treat the websocket base as a single point of connection or cover Ege's use-case.

{
"bases": {
    "http": {
      "href": "http://mywebthingserver.com/things/lamp",
      "subprotocol": "webthing"
    },
    "ws": {
      "href": "wss://mywebthingserver.com/things/lamp",
      "subprotocol": "webthing"
    }
  }
}

Yet another pointer

Another "natural" solution that we evaluated is the introduction of a well-known pattern in the TD and other JSON-based interface descriptions, pointers. We had designed different "flavors" of this pattern.

ProtocolId and baseProtocol

Introduce a single field in the form model to globally identify a form at the root level: protocolId. Then, in each affordance refer back to it using baseProtcol keyword. Here's an example:

{
  "@context": "http://www.w3.org/ns/td",
  "id": "urn:dev:ops:32473-WoTLamp-1234",
  "title": "MyLampThing",
  "forms": [
    {
      "href": "http://mywebthingserver.com/things/lamp",
      "protocolId": "http"
    },
    {
      "href": "wss://mywebthingserver.com/things/lamp",
      "subprotocol": "webthing",
      "protocolId": "ws"
    }
  ],
  "properties": {
    "status": {
      "type": "string",
      "forms": [
        {
          "href": "/properties/status",
          "baseProtocol": "http"
        },
        {
          "href": "/",
          "baseProtocol": "ws"
        }
      ]
    }
  }
}

This would allow covering quite well the two use cases but we still need to be careful to explain the mechanism in a protocol-agnostic way. Furthermore, it tries to reduce redundancy but at some time it adds these odd "empty" forms for each affordance that uses a root level form:

 {
    "href": "/",
    "baseProtocol": "ws"
 }

We should take countermeasures to avoid this level of unwanted artifacts.

Define only base and allow no operation at the root level

A similar solution to the previous one is the following:

{
    "forms": [
        {
            "href": "wss://mywebthingserver.com/things/lamp",
           "subprotocol": "webthing",
        }
    ],
    "actions": {},
    "properties": {
        "hello": {
            "forms": [
                {
                    "op": "readproperty",
                    "base": "/forms[0]",
                    "href": ""
                }
            ]
        }
    }
}

It has the same caveats as the first proposal, but it adds just one additional field for forms: base. This field is a JSON pointer that can only point to forms inside the root level forms field. Still it is not clear what a client should do, should he perform the pointed operations before doing something else? is it just a reference to a previously open connection?

Introduce connections

This approach makes connections a first citizen of the Web of Things world. It added a specific field in the TD that instructs the user how to create and manage a connection between him and the WebThing. Here's an example:

{
    "connections": {
        "webthing" : {
            "href": "https://www.w3.org/2019/wot/lamp",
            "subprotocol": "webthing",
            "keepalive": true // possibly remove this field, can we infer it from "webthing" protocol?
        },
       "broker" : {
            "href": "mqtt://www.w3.org/2019/wot/broker",  
        }
    },
    "title": "test",
    "securityDefinitions": {
        "no_sec": {
            "scheme": "nosec"
        }
    },
    "security": "no_sec",
    "connection": "webthing",
    "actions": {},
    "properties": {
      "hello_mqtt": {

            "forms": [
                {
                   "connection": "broker",
                    "op": "readproperty",
                    "href": "#hello"
                }
            ]
        },
        "hello": {
            "forms": [
                {
                    "href": "", // still this bad boy :(
                }
            ]
        }
    },
    "events": {}
}

This solution directly informs the users that this WebThing needs a connection up and running. We felt that it is clearer and it has the least amount of out-of-band information. Furthermore, since we are adding new elements the could have a clear semantic. For example, the connection field means that the client should create a connection before executing the operation described n the form. The cons are the fact that we still need empty forms cause the current schema requires it and that we are adding an additional field, making TDs more complex.

relu91 commented 3 years ago

Thank you for your comment @benfrancis and sorry for answering late. Aside from the general comments that I put in the previous post, I would address a couple of your points.

The problem is that the specification doesn't actually say anything about whether connections should be shared if the same URL is provided in multiple forms, so we can't assume that consumers will be smart enough to do this.

Defining a single ws:// endpoint in a top level form makes it explicit that only one connection is needed per device.

That's true for the spec is hard to be generic enough to describe different protocols and enforce this rule. I am wondering if we can move this level of constraint back to the protocol specification. Profiles could be another place but I think that this rule is to much "low" level to be expressed in a Profile. (not sure because the definition of a profile is a moving target)

IMHO this would change completely the interaction model with a WebThing and it will have a huge impact on the current Scripting API.

I don't think it does, interaction affordances are still defined individually and have individual http:// endpoints in their own forms, there's just an additional top level form for the ws:// endpoint which can be shared between all interaction affordances due to the nature of the WebSocket protocol.

Yes re-reading the comment I was overdramatic about this change and I think it does not change too much. The problem that I had is that the current interaction model does not have the concepts of how open and close a session with a remote WebThing. So this opening and closing would be out-of-band information that a user should figure out. Now I think that it is sufficient to infer it from the protocol itself. Although, I still think that in the end, it might be easier to have these concepts in the interaction model itself.

I'm afraid I'm not particularly worried about trying to maintain compatibility with the Scripting API as it's a non-normative specification (which doesn't even need to exist in my opinion).

Surely, the Scripting API should not be a blocker for new features. They can even be changed faster than the normative specs. I was just thinking about the possible implications of this change.

So you would like to describe "How to connect" at root level than in each affordance say "which message should I send to perform an operation".

As far as I know there is no mechanism for describing individual WebSocket messages in a Thing Description. We therefore rely entirely on an out of band specification for a sub-protocol called "webthing" which is negotiated with a protocol handshake with the server when a WebSocket connection is opened (but which could also be referenced from the subprotocol member of a form).

Yep, we don't really have something so expressive, but we could have specific message types for different operations. I was referring to Ege's use case for MQTT, where the connection packets are different from the subscription packets.

egekorkan commented 3 years ago

Thank you @relu91 for expressing all the different options we thought of. I have one more comment regarding @benfrancis example above that has :

"forms": [
  {
    "op": ["writeproperty", "invokeaction", "subscribeevent"],
    "href": "wss://mywebthingserver.com/things/lamp",
    "subprotocol": "webthing"
  }
]

In this case, why are the op keywords needed? If I know that the subprotocol is webthing, I already know that this socket will be used for the usual operations.

benfrancis commented 3 years ago

@relu91 Thanks for the detailed write-up.

That's true for the spec is hard to be generic enough to describe different protocols and enforce this rule. I am wondering if we can move this level of constraint back to the protocol specification. Profiles could be another place but I think that this rule is to much "low" level to be expressed in a Profile. (not sure because the definition of a profile is a moving target)

A WoT consumer isn't going to be able to use a WebSocket endpoint using the webthing sub-protocol unless it implements the out-of-band specification for that sub-protocol, so I think it would be fine to define these kinds of details in the specification for a given sub-protocol. The assumption being that consumers which don't support that subprotocol would just ignore it. It could also be specified in a profile which mandates support for a given subprotocol.

The problem that I had is that the current interaction model does not have the concepts of how open and close a session with a remote WebThing.

It's true that the Thing Description doesn't provide a way to tell a consumer that a WebSocket connection should be kept open, but it also doesn't have a way to specify that an HTTP connection shouldn't be kept open! I think sensible defaults for things like this really need to be inferred from the protocol, since there are so many variations on how protocols work.

@egekorkan wrote:

In this case, why are the op keywords needed? If I know that the subprotocol is webthing, I already know that this socket will be used for the usual operations.

That's a good question. We don't list operations in the current implementation, which uses links rather than forms.

I suggested adding ops for a couple of reasons:

To help distinguish forms from each other (our current solution with links uses a rel to help with this)
To allow a device to list a subset of operations it supports from the sub-protocol

If we assume that all devices support the full set of operations defined in the sub-protocol (or that it uses some in-protocol mechanism to determine the subset of operations supported), then I think it would be fine to infer the former from the href URI scheme and subprotocol.

I think I prefer that to adding the concept of "connections" to the TD, which could get quite confusing (e.g. Do you need a "connection" for SSE, webhooks or long-polling in HTTP or Observe in CoAP?).

I think there may be other cases where ops are needed to distinguish between forms, e.g. if we add proposed new operations like readallpastevents or subscribeallevents, but those can just be added to the list of allowed ops in the specification for top level forms.

benfrancis commented 3 years ago

To follow up on this, I concluded this should just be a form with no op value specified.

See https://github.com/WebThingsIO/gateway/issues/2806

My conclusion from https://github.com/w3c/wot-thing-description/issues/1070 is that this can be a form, and that we can simply omit the op member from that form altogether and rely on the protocol scheme and a subprotocol member to tell Consumers how to use it.

benfrancis commented 2 years ago

My conclusion from #1070 is that this can be a form, and that we can simply omit the op member from that form altogether and rely on the protocol scheme and a subprotocol member to tell Consumers how to use it.

I'm reopening this issue because since #1262 landed this is no longer possible. See further discussion in https://github.com/w3c/wot-thing-description/issues/1192#issuecomment-971432107

benfrancis commented 2 years ago

Just leaving a note to say that as I understand it, currently the only way to do describe a single WebSocket endpoint shared by all operations on a Thing is to provide the same value for href in all Forms across all interaction affordances of its Thing Description (and in any top level Forms), and then define in a WebSocket sub-protocol specification that connections should be re-used across interaction affordances.

See an example below:

{
  "@context": [
    "https://www.w3.org/2019/wot/td/v1.1"
  ],
  "id": "https://mywebthingserver.com/things/lamp",
  "base": "wss://mywebthingserver.com/things/lamp",
  "title": "My Lamp",
  "description": "A web connected lamp",
  "securityDefinitions": {
    "bearer": {
      "scheme": "bearer",
      "authorization": "https://mywebthingserver.com/authorize",
      "in": "query",
      "name": "jwt"
    }
  },
  "security": "bearer",
  "properties": {
    "on": {
      "type": "boolean",
      "title": "On/Off",
      "description": "Whether the lamp is turned on",
      "forms": [
        {
          "href": "",
          "op": ["readproperty", "writeproperty", "observeproperty", "unobserveproperty"],
          "subprotocol": "webthing"
        }
      ]
    },
    "level" : {
      "type": "integer",
      "title": "Brightness",
      "description": "The level of light from 0-100",
      "unit": "percent",
      "minimum" : 0,
      "maximum" : 100,
      "forms": [
        {
          "href": "",
          "op": ["readproperty", "writeproperty", "observeproperty", "unobserveproperty"],
          "subprotocol": "webthing"
        }
      ]
    }
  },
  "actions": {
    "fade": {
      "title": "Fade",
      "description": "Fade the lamp to a given level",
      "input": {
        "type": "object",
        "properties": {
          "level": {
            "type": "integer",
            "minimum": 0,
            "maximum": 100,
            "unit": "percent"
          },
          "duration": {
            "type": "integer",
            "minimum": 0,
            "unit": "milliseconds"
          }
        }
      },
      "forms": [
        {
          "href": "",
          "op": ["invokeaction", "queryaction", "cancelaction"],
          "subprotocol": "webthing"
        }
      ]
    }
  },
  "events": {
    "overheated": {
      "title": "Overheated",
      "data": {
        "type": "number",
        "unit": "degree celsius"
      },
      "description": "The lamp has exceeded its safe operating temperature",
      "forms": [
        {
          "href": "",
          "op": ["subscribeevent", "unsubscribeevent"],
          "subprotocol": "webthing"
        }
      ]
    }
  },
  "forms": [
    {
      "op": [
        "readallproperties",
        "readmultipleproperties",
        "writeallproperties",
        "writemultipleproperties",
        "queryallactions",
        "subscribeallevents",
        "unsubscribeallevents"
      ],
      "href": "",
      "subprotocol": "webthing"
    }
  ]
}