w3c / wot-thing-description

Web of Things (WoT) Thing Description
http://w3c.github.io/wot-thing-description/
Other
131 stars 63 forks source link

Managing Dynamically Created Resources in TDs #899

Open vcharpenay opened 4 years ago

vcharpenay commented 4 years ago

The TD model in its first version does not allow Things to expose dynamically created resources, such as resources giving the status of long-lasting actions or event subscription resources.

A proposal is available under /proposals/hypermedia-control. (The proposal is rather long so I put it in its own file instead of exposing it in the issue.)

vcharpenay commented 4 years ago

This proposal addresses the issue discussed in #302.

relu91 commented 4 years ago

I like this approach of dynamic TDs, mostly because it does not mess with the return type of an action. Some notes/questions:

vcharpenay commented 4 years ago

Thanks for the comments!

How would I model PUSH based interaction instead of a PULL based? (i.e. what if the state of the action is pushed to me instead of I read the state every now and then)

My assumption is that it is a matter of finding the right operation types but I guess it would be easier to think about it with an example: do you have a concrete case where that happens, e.g. some MQTT interaction with existing devices?

We are leaving out the problem of ownership, what happens if some other consumer tries to cancel an action started by me.

If the Thing is natively supporting the TD model, it has control over what it sends to Consumers. It doesn't have to expose all affordances to all Consumers. In the case there is a TDir (or some proxy) exposing a TD for some legacy device, this is a bit more arduous to implement, indeed. Some help in that respect would be welcome!

Can this behavior be generalized in other use cases?

Yes, definitely. At least, that's what this proposal is aiming at. The concern I personally have about it is how we can limit the set of operation types to a minimum. There shouldn't be hundreds of them.

egekorkan commented 4 years ago

How would I model PUSH based interaction instead of a PULL based? (i.e. what if the state of the action is pushed to me instead of I read the state every now and then)

I think this would be handled on the protocol level? So the queryaction op could be in a form with MQTT and with the subscribe "method"

relu91 commented 4 years ago

I am not sure that solving that at the protocol level is enough. From an op called queryaction I expect to have a polling behavior. Ok, I could implement the poll with a subscribe on MQTT but it is still polling. What if the application would just subscribe to the completion of the action?

My case is that we might even need two other ops subscribestate unsubscribestate. Which goes a little bit against what stated here 😃 :

The concern I personally have about it is how we can limit the set of operation types to a minimum. There shouldn't be hundreds of them

One possible solution: the updated TD can also have new events about the action status. However, I think it goes against the sematic of TD affordances (i.e. they reflect physical world entities).

vcharpenay commented 4 years ago

One possible solution: the updated TD can also have new events about the action status

There is nothing preventing the Thing from adding a new event affordance, as long as the event is generated by some physical state change. (Again, an example would help discussing the matter.)

benfrancis commented 4 years ago

I'm really pleased to see this topic being discussed, and thank you @vcharpenay for clearly articulating a proposal.

In the example fade action, its output includes a "done" state, such that a GET on /fade/1 once the fade has completed would return the string 'done', implying that action requests remain in the list of affordances after they are complete.

If over the course of a day the fade action is invoked 1,000 times, does that mean there will be 3,000 new Form objects added to the Thing Description?

{
  "@context": "https://www.w3.org/2019/wot/td/v1",
  "id": "urn:ex:thing",
  "actions": {
    "fade": {
      "input": {
        "type": "number",
        "description": "duration (in ms)"
      },
      "output": {
        "type": "string",
        "description": "fade status (pending, running, done)"
      },
      "update": {
        "type": "number",
        "description": "new duration (in ms)"
      },
      "cancellation": {},
      "forms": [
        {
          "href": "/fade",
          "op": "invokeaction"
        },
        {
          "href": "/fade/1",
          "op": "readaction"
        },
        {
          "href": "/fade/1",
          "op": "updateaction"
        },
        {
          "href": "/fade/1",
          "op": "cancelaction"
        }
        {
          "href": "/fade/2",
          "op": "readaction"
        },
        {
          "href": "/fade/2",
          "op": "updateaction"
        },
        {
          "href": "/fade/2",
          "op": "cancelaction"
        }
        {
          "href": "/fade/3",
          "op": "readaction"
        },
        {
          "href": "/fade/3",
          "op": "updateaction"
        },
        {
          "href": "/fade/3",
          "op": "cancelaction"
        }
       ...
      ]
    }
  }
}

This seems like a very inefficient way of representing a simple queue.

A more efficient representation might be to use URI templates to define a path like...

/fade/{actionRequestID}

...in the same way that the OpenAPI specification represents endpoints of an API with a list of paths for example.

This would also need to be accompanied by a separate affordance to list action requests, e.g. with an op of listactionrequests (note the distinction between an "action" and an "action request", because "listactions" might imply listing the available actions, not instances of invoked actions.)

{
  "@context": "https://www.w3.org/2019/wot/td/v1",
  "id": "urn:ex:thing",
  "actions": {
    "fade": {
      "input": {
        "type": "number",
        "description": "duration (in ms)"
      },
      "output": {
        "type": "string",
        "description": "fade status (pending, running, done)"
      },
      "update": {
        "type": "number",
        "description": "new duration (in ms)"
      },
      "cancellation": {},
      "forms": [
        {
          "href": "/fade",
          "op": "invokeaction"
        },
        {
          "href": "/fade/",
          "op": "listactionrequests"
        },
        {
          "href": "/fade/{actionRequestID}",
          "op": "readactionrequest"
        },
        {
          "href": "/fade/{actionRequestID}",
          "op": "updateactionrequest"
        },
        {
          "href": "/fade/{actionRequestID}",
          "op": "cancelactionrequest"
        }
      ]
    }
  }
}

This doesn't make it entirely clear what the payload of the /fade resource would be however. It could just return an array of objects conforming to the output schema for the action, but then the client has no way of knowing which action status corresponds to which action request.

All of these problems can be solved by extending the metadata in the Thing Description to describe in detail:

  1. The endpoints of the API
  2. The operations which can be performed on those endpoints
  3. Expected verbs, headers, payloads, success and error responses for those operations

It's hard to avoid the conclusion that if we continue down this path we are eventually just going to re-invent the whole OpenAPI specification. And OpenAPI is only expressive enough to describe RESTful APIs! It can't describe an MQTT or WebSocket sub-protocol for example.

I'm sorry to sound like a broken record, but this is why I maintain that trying to define a declarative JSON syntax for describing any existing API or protocol is simply not practical. These kinds of complex interactions can only practically be described with out-of-band information in the form of a human-readable specification which defines a concrete protocol binding or sub-protocol that developers implement in a WoT client. Such sub-protocol specifications could then be referenced from a Thing Description via the "profile" mechanism described in the WG charter, or just a special @context annotation.

vcharpenay commented 4 years ago

This seems like a very inefficient way of representing a simple queue.

i don't see the difference to serving a queue of all action requests by GETting the action resource, as in your spec (Example 17). On the contrary, it looks to me that my proposal brings the TD model closer to your spec. Just consider the action resource as a "piece" of a TD.

A more efficient representation might be to use URI templates to define a path

As stated in my proposal, hypermedia control does not require that a Consumer gets all possible interactions at once. If it must first invoke the action to get a representation action request, it makes little sense to expose an affordance to the action request (even using URI templates) at the same time. This second affordance could be exposed later.

This doesn't make it entirely clear what the payload of the /fade resource would be however.

It should look like a TD form, that's the point of my proposal. In your spec, it already does because it includes an href key. What remains open is how to handle other cases, like the Oracle Cloud API that uses url instead. One extreme case is to always consume again the original TD to see what has changed. But there are many ways to optimize this and I expect the group will have a discussion on that.

benfrancis commented 4 years ago

i don't see the difference to serving a queue of all action requests by GETting the action resource, as in your spec (Example 17). On the contrary, it looks to me that my proposal brings the TD model closer to your spec. Just consider the action resource as a "piece" of a TD.

I acknowledge that there are similarities with the Web Thing API and appreciate that this has been taken into account.

The differences in your proposal are:

  1. The Thing Description is no longer a static description of a device's capabilities, it now mixes metadata about a device with data about actions invoked on the device by the user, potentially resulting in an extremely large monolithic Thing Description resource which the client needs to keep synchronised with a server
  2. It requires adding duplicate Form objects for each action request which express the same information for each request
  3. It feels half way between a declarative protocol binding (expressed via hypermedia controls) and a concrete protocol binding (by enforcing a payload format for the action endpoint which looks like a TD form). It tries to stick with the declarative nature of the Thing Description, but still isn't expressive enough to describe any existing API (e.g. the Oracle example you gave).

One extreme case is to always consume again the original TD to see what has changed. But there are many ways to optimize this and I expect the group will have a discussion on that.

The alternative option you suggested was "A more specific protocol should be specified on how to exchange pieces of a TD, e.g. along the lines of HTTP Range Requests."

If this solution means it's necessary to define a protocol for keeping the Thing Description synchronised between a WoT client and WoT server as resources are created and deleted, why not just define a (sub-)protocol for how to invoke, get, update and cancel actions over HTTP?

relu91 commented 4 years ago

Sorry for splitting the conversation (which is very interesting). About the example, here is a thought experiment that I had in mind:

Scenario

WoT consumer wants to move the arm for A to B and displays "success" if the action is completed or, otherwise, move it back to point A. Therefore, if I am understanding your proposal, the consumer uses invokeaction and then calls readaction in a loop. Every time, readaction will always return the current status. In my mind, this is true even using MQTT, because the semantic of readaction is: "read the current status of the actionnotsubscribe me util the status changes`.

Therefore, I'd like need to have a way to express the fact that the robotic arm is capable also to send "stuck" events for that particular action. Notice that multiple actions can run simultaneously (i.e. move while rotate), therefore the stuck event is more an action event than a proper thing event.

If this solution means it's necessary to define a protocol for keeping the Thing Description synchronised between a WoT client and WoT server as resources are created and deleted, why not just define a (sub-)protocol for how to invoke, get, update and cancel actions over HTTP?

While I am here my two cents: the difference that I see here is that keeping TD synchronized is a more general mechanism that can be exploited in other scenarios. The first basic ideas that come to my mind:

vcharpenay commented 4 years ago

it now mixes metadata about a device with data about actions invoked on the device by the user

To me (and in fact, as per the theory behind hypermedia control), a link to an action resource and a link to an action request resource are both metadata. Control metadata, more precisely. That's the only thing I expect in a TD. Note that I consider "data about actions invoked on the device" to be roughly its status and this should appear nowhere in the TD itself.

It tries to stick with the declarative nature of the Thing Description, but still isn't expressive enough to describe any existing API (e.g. the Oracle example you gave).

Well, if Oracle had to comply to a specific protocol as you suggest, it would have to change url to href as well. I insist "there are many ways" to solve that problem and another one could be to assign a JSON-LD context to messages that include control metadata, so that one can map them to the TD model. JSON-LD was designed for that purpose.

Here is an excerpt of what the Oracle Cloud may return.

{
  "id":"72a4239f1644-ccf",
  "url": "https://iotserver/iot/api/version/resource/path",
  "method": "GET"
  ...
}

The following context would map that payload to a proper TD form (note: hctl:hasTarget is what href maps to in the standard TD context):

{
  "@context": {
    "hctl": "https://www.w3.org/2019/wot/hypermedia#",
    "htv": "http://www.w3.org/2011/http#",
    "url": "hctl:hasTarget",
    "method": "htv:methodName"
  }
}

You could then apply the standard JSON-LD transformation procedures to obtain a form as specified in the TD model:

let buf = jsonld.expand(oracleForm, oracleContext);
let standardForm = jsonld.compact(form, standardContext);
{
  "href": "https://iotserver/iot/api/version/resource/path",
  "htv:methodName": "GET"
}

I don't mean to standardize exactly this but I hope it illustrates the point that there are alternative ways to a specific protocol for action invocation.

benfrancis commented 4 years ago

@vcharpenay wrote:

To me (and in fact, as per the theory behind hypermedia control), a link to an action resource and a link to an action request resource are both metadata. Control metadata, more precisely. That's the only thing I expect in a TD. Note that I consider "data about actions invoked on the device" to be roughly its status and this should appear nowhere in the TD itself.

OK, fair enough. My main concern is the idea of changing the nature of the Thing Description from a largely static description of device capabilities (acting as the entry point for a web thing which may change only very occasionally) into a dynamic resource which the client needs to constantly keep in sync with the server in order to know about new resources.

Is there a particular reason to design it this way, rather than simply linking to a list of action requests as a separate resource?

As I think you're aware, the way that the Mozilla implementation models action queues is by having each ActionAffordance link to a separate Action resource which resolves to a list of action requests.

  "actions": {
    "fade": {
      "title": "Fade",
      "input": {
        "type": "object",
        "properties": {
          "level": {
            "type": "integer",
            "minimum": 0,
            "maximum": 100
          },
          "duration": {
            "type": "integer",
            "minimum": 0,
            "unit": "milliseconds"
          }
        }
      },
      "links": [{"href": "/things/lamp/actions/fade"}]
    }
  },

The same could be achieved with forms with a new set of ops as described above.

there are alternative ways to a specific protocol for action invocation.

Yes, this is why I am continuing to work on a standard (sub-)protocol for the Web of Things via the Web Thing Protocol Community Group, because currently this open ended complexity means it is effectively impossible to create a WoT client which can talk to any WoT device.

But in the meantime, if you want to be able to describe these kinds of APIs declaratively in the Thing Description I would suggest the need for more expressive syntax, perhaps along the lines of OpenAPI, and hopefully not something that requires complex RDF-based transformations with JSON-LD.

takuki commented 4 years ago

There are multiple components in @vcharpenay 's proposal as I understand.

The idea of introducing new operation types looks good to me.

I also found @benfrancis 's suggestion of use of URI templates helpful. By using URI template, we may not need to introduce dynamic TD.

Every time, readaction will always return the current status. In my mind, this is true even using MQTT, because the semantic of readaction is: "read the current status of the actionnotsubscribe me util the status changes`.

I think this is a good point. An application does not have to keep calling readaction operation many times if the protocol is MQTT. Don't we need a metadata that tells whether readaction is pull or push?

egekorkan commented 4 years ago

After reading the proposal and preparing for one with static TDs, some questions came to my mind.

egekorkan commented 4 years ago

Created #907 as an alternative

danielpeintner commented 4 years ago

I would like to highlight some more generic differences/assumptions between static vs dynamic TDs.

Dynamic TDs

Static TDs

I am pretty sure there are more relevant assumptions/concerns we should start collecting...

mmccool commented 4 years ago

I'd like to state for the record that I think dynamically modifying TDs will raise a bunch of troublesome issues with security (once we add signing), IDs (if they hash contents), directories, caching, and so forth. Also, I think that for developer documentation we really want a static (set of...) templates at least.

So I would strongly support a proposal that gives a static description, or at least a static template (or a set; for example, static Action Description Templates if we want to describe dynamic actions separately).

takuki commented 4 years ago

Created #907 as an alternative

The example comparison between fully-static and hypermedia-static that is provided in the proposal appears very interesting to me.

As I stated in issue #302, Thing-Consumer protocol with regards to Action can always look forward, but not backward.

I would like to point out that Thing-Consumer protocol as much as possible, should look forward, but not backward. This simplifies Consumer implementation a lot, which is important when you think about consumer appliances such as a dimmable light in a room. A remote control for the light should be as simple as possible. I think the fully-static TD works fine in many similar simple cases.

benfrancis commented 4 years ago

@egekorkan wrote:

Created #907 as an alternative

This proposal seems like a reasonable approach to declaratively defining action operations in a Thing Description and in my view is preferable to a dynamic Thing Description.

Currently, the output would be expected as the response to the POST /fade request, i.e. the response of invokeaction.

Note: As far as I know the current specification does not say that the output of an invokeaction operation should represent the end result of the action. That wouldn't work for long-running actions requested via HTTP where the running time of the action is longer than the HTTP response timeout. As I understand it an immediate201 Created response to the action invocation request, just to confirm the action was requested, would already be valid with the current specification, though a client wouldn't necessarily know what that response means.

if we have a Thing that allows only a single Consumer to interact, the id can be static as shown above, like /fade/ongoing

That assumes that only one action of a given type can be invoked at a time. It's possible that a web thing could have multiple instances of the same action type running in parallel, or have multiple requests lined up in a queue to be executed sequentially. For example, you might want to instruct a robot arm to invoke a series of movements one after the other, or print a series of receipts on a thermal printer.

Given that hypermedia is an advanced use case and that we should not break existing Consumer implementations, the input and output in Action Affordance level correspond to the invokeaction. I propose to add three new vocabulary terms in the Action Affordance level, named query, update and cancel that are of Object type.

For completeness, it might make sense to add an invoke object type as well, but continue to support the input and output of invoke at the top level of the Action object for backwards compatibility.

The part of this proposal that I think will be the hardest to define in a specification is how a client keeps track of templated values between output and form objects. Can the value of any href, input or output member of any affordance in a Thing Description contain a URI template? What meaning should a client attribute to those values?

Also, consideration needs to given to error conditions. How does the Thing Description describe the result of an action invocation, update or cancellation that fails?

zolkis commented 4 years ago

I also like #907 more. We can specify contentType in the Form for querying an Action (please include that in the examples), but can we specify a DataSchema?

egekorkan commented 4 years ago

Updated on 10.06.2020 11:48 am CET

After discussions with @mkovatsc following the WISHI call of 08.06.2020 , below are his comments regarding the use of hypermedia in the context of W3C WoT. @mkovatsc if there is anything wrong or missing, feel free to edit this exact comment :blush:

Feedback on the proposals

A better way forward

We would need a way that is dynamic and based on the responses of the Thing (more specifically a specific media type) that is not necessarily fully described in a TD. The responses of the Thing would guide the Consumer and the TD should ensure that the Consumer can check beforehand that it will understand all the possible responses.

My comment on this: If there was a widespread hypermedia standard, we would not need TDs, the Consumer would be able to use an API from an initial endpoint and discover the API (also see HATEOAS).

CoRAL draft (https://tools.ietf.org/html/draft-ietf-core-coral-03) from IETF (@ektrah) is a proposal that is more aligned with "real" hypermedia. There, a specific media type i.e. application/coral+cbor is used and a Consumer who can parse this, will be able to understand on how to use the Thing.

We can also explore how one can describe a state machine in a TD.

More comments in general

There is no widely accepted hypermedia standard. That means that we can prescribe how it should be done with TDs. We can somehow support the existing implementations by Oracle and Mozilla but we do not have to guide the greenfield on the fact that hypermedia should be done like this.

My comments on this: This would mean almost a separate task force that focuses on such a deliverable.

zolkis commented 4 years ago

We would need a way that is dynamic and based on the responses of the Thing that is not necessarily fully described in a TD. The responses of the Thing would guide the Consumer.

And preferably the Consumer can parse it in a similar way it does a TD. Which brings to the idea of returning a control object that is parseable as a TD, i.e. homomorphic with a TD. That would be quite easy to specify based on the TD and just needs a different name than a Thing, for instance Process or something else.

vcharpenay commented 4 years ago

And preferably the Consumer can parse it in a similar way it does a TD.

To me, it would be preferable to try to align with OpenAPI or CoRAL for generic hypermedia control... Or to reuse the hypermedia controls module of TDs. Things can return links and forms only. CoRAL describes form input as form fields, which is something we can add to Form objects in the TD model.

(The main difference to your suggestion, @zolkis is that ActionAffordances still refer to physical actions and not to arbitrary REST operations on data.)

takuki commented 4 years ago

In 2020-06-12 telecon, it was suggested this thread might have reached a point where we need to discuss in F2F meeting for a decision. @mjkoster mentioned he also has a baseline implementation with hypermedia control.

mkovatsc commented 4 years ago

And preferably the Consumer can parse it in a similar way it does a TD. Which brings to the idea of returning a control object that is parseable as a TD, i.e. homomorphic with a TD. That would be quite easy to specify based on the TD and just needs a different name than a Thing, for instance Process or something else.

This was exactly what I had in mind back then, to define "Action Description" based on the Thing Description spec -- basically the TD format with something like @type: Action instead of Thing.

However, CoRAL support should be developed in parallel (it would need some critical mass to establish a new, true hypermedia format. OpenAPI does not seem fit for hypermedia, unless they recently made a leap forward.

takuki commented 3 years ago

Discussed In TD teleconference on 2020-07-15 (see minutes).

takuki commented 3 years ago

Discussed in a TD session during virtual F2F meeting on 2020-10-21 (see minutes). It was suggested by @mlagally and others to further discuss this issue in WoT Profile calls.

egekorkan commented 3 years ago

Couldn't find a related issue in WoT Profiles so I am posting it here. After talking with the participants (@TaoXu00 and @dearzhaorui) of the BRAIN-IoT project (http://www.brain-iot.eu/) that Siemens is also part of, there is further use cases for this in the robotics field. Below is an extract of the TD that they use for describing the already existing endpoints of a robot made by Robotnik (https://robotnik.eu/):

{
   "title":"robotnik",
   "description":"Robotnik REST Implementation for Brain-Iot",
   "actions":{
      "PlaceAdd":{
         "description":"Commands a robot to start place procedure",
         "input":{...},
         "output":{
            "type":"object",
            "properties":{
               "state":{
                  "type":"object",
                  "properties":{
                     "current_state":{
                        "type":"string",
                        "enum":["queued","running","paused","finished","unknown"]
                     }
               }
            }
         },
         "forms":[...]
      },
      "PlaceCancel":{
         "description":"Cancels the current place mission",
         "input":{
            "type":"object",
            "properties":{
               "header":{
                  "type":"object",
                  "properties":{
                     "id":{
                        "type":"string",
                        "description":"The ID of the place mission you want to cancel; -1 cancels last mission"
                     }
                  }
               }
            }
         },
         "output":{
            "type":"object",
            "properties":{
               "state":{
                  // same as above
               }
            }
         },
         "forms":[...]
      },
      "PlaceQuery":{
         "description":"Gets the state of a place mission",
         "input":{
            "type":"object",
            "properties":{
               "header":{
                  "type":"object",
                  "properties":{
                     "id":{
                        "type":"string",
                        "description":"The id of the place mission you want to get the query state; -1 gets the query state of the last mission"
                     }
                  }
               }
            }
         },
         "output":{
            "type":"object",
            "properties":{
               "state":{
                  // same as above
               }
            }
         },
         "forms":[... ]
      }
   }
}

So basically managing the place action is done by 3 different actions and no apparent link between them can be established with a standard TD. I think that the minimum work for this feature of TD is to create some sort of link relations (like rel keyword) between different interaction affordances and leave it open how this can be done/implemented.