Open vcharpenay opened 4 years ago
This proposal addresses the issue discussed in #302.
I like this approach of dynamic TDs, mostly because it does not mess with the return type of an action. Some notes/questions:
Thanks for the comments!
How would I model PUSH based interaction instead of a PULL based? (i.e. what if the state of the action is pushed to me instead of I read the state every now and then)
My assumption is that it is a matter of finding the right operation types but I guess it would be easier to think about it with an example: do you have a concrete case where that happens, e.g. some MQTT interaction with existing devices?
We are leaving out the problem of ownership, what happens if some other consumer tries to cancel an action started by me.
If the Thing is natively supporting the TD model, it has control over what it sends to Consumers. It doesn't have to expose all affordances to all Consumers. In the case there is a TDir (or some proxy) exposing a TD for some legacy device, this is a bit more arduous to implement, indeed. Some help in that respect would be welcome!
Can this behavior be generalized in other use cases?
Yes, definitely. At least, that's what this proposal is aiming at. The concern I personally have about it is how we can limit the set of operation types to a minimum. There shouldn't be hundreds of them.
How would I model PUSH based interaction instead of a PULL based? (i.e. what if the state of the action is pushed to me instead of I read the state every now and then)
I think this would be handled on the protocol level? So the queryaction
op could be in a form with MQTT and with the subscribe "method"
I am not sure that solving that at the protocol level is enough. From an op called queryaction
I expect to have a polling behavior. Ok, I could implement the poll with a subscribe on MQTT but it is still polling. What if the application would just subscribe to the completion of the action?
My case is that we might even need two other ops subscribestate
unsubscribestate
. Which goes a little bit against what stated here 😃 :
The concern I personally have about it is how we can limit the set of operation types to a minimum. There shouldn't be hundreds of them
One possible solution: the updated TD can also have new events about the action status. However, I think it goes against the sematic of TD affordances (i.e. they reflect physical world entities).
One possible solution: the updated TD can also have new events about the action status
There is nothing preventing the Thing from adding a new event affordance, as long as the event is generated by some physical state change. (Again, an example would help discussing the matter.)
I'm really pleased to see this topic being discussed, and thank you @vcharpenay for clearly articulating a proposal.
In the example fade action, its output includes a "done" state, such that a GET
on /fade/1
once the fade has completed would return the string 'done'
, implying that action requests remain in the list of affordances after they are complete.
If over the course of a day the fade action is invoked 1,000 times, does that mean there will be 3,000 new Form
objects added to the Thing Description?
{
"@context": "https://www.w3.org/2019/wot/td/v1",
"id": "urn:ex:thing",
"actions": {
"fade": {
"input": {
"type": "number",
"description": "duration (in ms)"
},
"output": {
"type": "string",
"description": "fade status (pending, running, done)"
},
"update": {
"type": "number",
"description": "new duration (in ms)"
},
"cancellation": {},
"forms": [
{
"href": "/fade",
"op": "invokeaction"
},
{
"href": "/fade/1",
"op": "readaction"
},
{
"href": "/fade/1",
"op": "updateaction"
},
{
"href": "/fade/1",
"op": "cancelaction"
}
{
"href": "/fade/2",
"op": "readaction"
},
{
"href": "/fade/2",
"op": "updateaction"
},
{
"href": "/fade/2",
"op": "cancelaction"
}
{
"href": "/fade/3",
"op": "readaction"
},
{
"href": "/fade/3",
"op": "updateaction"
},
{
"href": "/fade/3",
"op": "cancelaction"
}
...
]
}
}
}
This seems like a very inefficient way of representing a simple queue.
A more efficient representation might be to use URI templates to define a path like...
/fade/{actionRequestID}
...in the same way that the OpenAPI specification represents endpoints of an API with a list of paths for example.
This would also need to be accompanied by a separate affordance to list action requests, e.g. with an op
of listactionrequests
(note the distinction between an "action" and an "action request", because "listactions" might imply listing the available actions, not instances of invoked actions.)
{
"@context": "https://www.w3.org/2019/wot/td/v1",
"id": "urn:ex:thing",
"actions": {
"fade": {
"input": {
"type": "number",
"description": "duration (in ms)"
},
"output": {
"type": "string",
"description": "fade status (pending, running, done)"
},
"update": {
"type": "number",
"description": "new duration (in ms)"
},
"cancellation": {},
"forms": [
{
"href": "/fade",
"op": "invokeaction"
},
{
"href": "/fade/",
"op": "listactionrequests"
},
{
"href": "/fade/{actionRequestID}",
"op": "readactionrequest"
},
{
"href": "/fade/{actionRequestID}",
"op": "updateactionrequest"
},
{
"href": "/fade/{actionRequestID}",
"op": "cancelactionrequest"
}
]
}
}
}
This doesn't make it entirely clear what the payload of the /fade resource would be however. It could just return an array of objects conforming to the output
schema for the action, but then the client has no way of knowing which action status corresponds to which action request.
All of these problems can be solved by extending the metadata in the Thing Description to describe in detail:
It's hard to avoid the conclusion that if we continue down this path we are eventually just going to re-invent the whole OpenAPI specification. And OpenAPI is only expressive enough to describe RESTful APIs! It can't describe an MQTT or WebSocket sub-protocol for example.
I'm sorry to sound like a broken record, but this is why I maintain that trying to define a declarative JSON syntax for describing any existing API or protocol is simply not practical. These kinds of complex interactions can only practically be described with out-of-band information in the form of a human-readable specification which defines a concrete protocol binding or sub-protocol that developers implement in a WoT client. Such sub-protocol specifications could then be referenced from a Thing Description via the "profile" mechanism described in the WG charter, or just a special @context
annotation.
This seems like a very inefficient way of representing a simple queue.
i don't see the difference to serving a queue of all action requests by GETting the action resource, as in your spec (Example 17). On the contrary, it looks to me that my proposal brings the TD model closer to your spec. Just consider the action resource as a "piece" of a TD.
A more efficient representation might be to use URI templates to define a path
As stated in my proposal, hypermedia control does not require that a Consumer gets all possible interactions at once. If it must first invoke the action to get a representation action request, it makes little sense to expose an affordance to the action request (even using URI templates) at the same time. This second affordance could be exposed later.
This doesn't make it entirely clear what the payload of the /fade resource would be however.
It should look like a TD form, that's the point of my proposal. In your spec, it already does because it includes an href
key. What remains open is how to handle other cases, like the Oracle Cloud API that uses url
instead. One extreme case is to always consume again the original TD to see what has changed. But there are many ways to optimize this and I expect the group will have a discussion on that.
i don't see the difference to serving a queue of all action requests by GETting the action resource, as in your spec (Example 17). On the contrary, it looks to me that my proposal brings the TD model closer to your spec. Just consider the action resource as a "piece" of a TD.
I acknowledge that there are similarities with the Web Thing API and appreciate that this has been taken into account.
The differences in your proposal are:
Form
objects for each action request which express the same information for each requestOne extreme case is to always consume again the original TD to see what has changed. But there are many ways to optimize this and I expect the group will have a discussion on that.
The alternative option you suggested was "A more specific protocol should be specified on how to exchange pieces of a TD, e.g. along the lines of HTTP Range Requests."
If this solution means it's necessary to define a protocol for keeping the Thing Description synchronised between a WoT client and WoT server as resources are created and deleted, why not just define a (sub-)protocol for how to invoke, get, update and cancel actions over HTTP?
Sorry for splitting the conversation (which is very interesting). About the example, here is a thought experiment that I had in mind:
WoT consumer wants to move the arm for A to B and displays "success" if the action is completed or, otherwise, move it back to point A. Therefore, if I am understanding your proposal, the consumer uses invokeaction
and then calls readaction
in a loop. Every time, readaction
will always return the current status. In my mind, this is true even using MQTT, because the semantic of readaction
is: "read the current status of the actionnot
subscribe me util the status changes`.
Therefore, I'd like need to have a way to express the fact that the robotic arm is capable also to send "stuck" events for that particular action. Notice that multiple actions can run simultaneously (i.e. move while rotate), therefore the stuck event is more an action event than a proper thing event.
If this solution means it's necessary to define a protocol for keeping the Thing Description synchronised between a WoT client and WoT server as resources are created and deleted, why not just define a (sub-)protocol for how to invoke, get, update and cancel actions over HTTP?
While I am here my two cents: the difference that I see here is that keeping TD synchronized is a more general mechanism that can be exploited in other scenarios. The first basic ideas that come to my mind:
it now mixes metadata about a device with data about actions invoked on the device by the user
To me (and in fact, as per the theory behind hypermedia control), a link to an action resource and a link to an action request resource are both metadata. Control metadata, more precisely. That's the only thing I expect in a TD. Note that I consider "data about actions invoked on the device" to be roughly its status and this should appear nowhere in the TD itself.
It tries to stick with the declarative nature of the Thing Description, but still isn't expressive enough to describe any existing API (e.g. the Oracle example you gave).
Well, if Oracle had to comply to a specific protocol as you suggest, it would have to change url
to href
as well. I insist "there are many ways" to solve that problem and another one could be to assign a JSON-LD context to messages that include control metadata, so that one can map them to the TD model. JSON-LD was designed for that purpose.
Here is an excerpt of what the Oracle Cloud may return.
{
"id":"72a4239f1644-ccf",
"url": "https://iotserver/iot/api/version/resource/path",
"method": "GET"
...
}
The following context would map that payload to a proper TD form (note: hctl:hasTarget is what href
maps to in the standard TD context):
{
"@context": {
"hctl": "https://www.w3.org/2019/wot/hypermedia#",
"htv": "http://www.w3.org/2011/http#",
"url": "hctl:hasTarget",
"method": "htv:methodName"
}
}
You could then apply the standard JSON-LD transformation procedures to obtain a form as specified in the TD model:
let buf = jsonld.expand(oracleForm, oracleContext);
let standardForm = jsonld.compact(form, standardContext);
{
"href": "https://iotserver/iot/api/version/resource/path",
"htv:methodName": "GET"
}
I don't mean to standardize exactly this but I hope it illustrates the point that there are alternative ways to a specific protocol for action invocation.
@vcharpenay wrote:
To me (and in fact, as per the theory behind hypermedia control), a link to an action resource and a link to an action request resource are both metadata. Control metadata, more precisely. That's the only thing I expect in a TD. Note that I consider "data about actions invoked on the device" to be roughly its status and this should appear nowhere in the TD itself.
OK, fair enough. My main concern is the idea of changing the nature of the Thing Description from a largely static description of device capabilities (acting as the entry point for a web thing which may change only very occasionally) into a dynamic resource which the client needs to constantly keep in sync with the server in order to know about new resources.
Is there a particular reason to design it this way, rather than simply linking to a list of action requests as a separate resource?
As I think you're aware, the way that the Mozilla implementation models action queues is by having each ActionAffordance
link to a separate Action
resource which resolves to a list of action requests.
"actions": {
"fade": {
"title": "Fade",
"input": {
"type": "object",
"properties": {
"level": {
"type": "integer",
"minimum": 0,
"maximum": 100
},
"duration": {
"type": "integer",
"minimum": 0,
"unit": "milliseconds"
}
}
},
"links": [{"href": "/things/lamp/actions/fade"}]
}
},
The same could be achieved with forms with a new set of op
s as described above.
there are alternative ways to a specific protocol for action invocation.
Yes, this is why I am continuing to work on a standard (sub-)protocol for the Web of Things via the Web Thing Protocol Community Group, because currently this open ended complexity means it is effectively impossible to create a WoT client which can talk to any WoT device.
But in the meantime, if you want to be able to describe these kinds of APIs declaratively in the Thing Description I would suggest the need for more expressive syntax, perhaps along the lines of OpenAPI, and hopefully not something that requires complex RDF-based transformations with JSON-LD.
There are multiple components in @vcharpenay 's proposal as I understand.
The idea of introducing new operation types looks good to me.
I also found @benfrancis 's suggestion of use of URI templates helpful. By using URI template, we may not need to introduce dynamic TD.
Every time, readaction will always return the current status. In my mind, this is true even using MQTT, because the semantic of readaction is: "read the current status of the actionnotsubscribe me util the status changes`.
I think this is a good point. An application does not have to keep calling readaction operation many times if the protocol is MQTT. Don't we need a metadata that tells whether readaction is pull or push?
After reading the proposal and preparing for one with static TDs, some questions came to my mind.
output
now changes the meaning and becomes the payload of a readaction
response instead of invokeaction
response ? This might break some current implementations where they would expect the output
to correspond to the payload of the initial POST request's response.invokeaction
operation and input
, output
would correspond to this operation. Now somehow each operation should have an input
and output
?Created #907 as an alternative
I would like to highlight some more generic differences/assumptions between static vs dynamic TDs.
Dynamic TDs
Static TDs
runningActions
or soI am pretty sure there are more relevant assumptions/concerns we should start collecting...
I'd like to state for the record that I think dynamically modifying TDs will raise a bunch of troublesome issues with security (once we add signing), IDs (if they hash contents), directories, caching, and so forth. Also, I think that for developer documentation we really want a static (set of...) templates at least.
So I would strongly support a proposal that gives a static description, or at least a static template (or a set; for example, static Action Description Templates if we want to describe dynamic actions separately).
Created #907 as an alternative
The example comparison between fully-static and hypermedia-static that is provided in the proposal appears very interesting to me.
As I stated in issue #302, Thing-Consumer protocol with regards to Action can always look forward, but not backward.
I would like to point out that Thing-Consumer protocol as much as possible, should look forward, but not backward. This simplifies Consumer implementation a lot, which is important when you think about consumer appliances such as a dimmable light in a room. A remote control for the light should be as simple as possible. I think the fully-static TD works fine in many similar simple cases.
@egekorkan wrote:
Created #907 as an alternative
This proposal seems like a reasonable approach to declaratively defining action operations in a Thing Description and in my view is preferable to a dynamic Thing Description.
Currently, the output would be expected as the response to the POST /fade request, i.e. the response of invokeaction.
Note: As far as I know the current specification does not say that the output of an invokeaction
operation should represent the end result of the action. That wouldn't work for long-running actions requested via HTTP where the running time of the action is longer than the HTTP response timeout. As I understand it an immediate201 Created
response to the action invocation request, just to confirm the action was requested, would already be valid with the current specification, though a client wouldn't necessarily know what that response means.
if we have a Thing that allows only a single Consumer to interact, the id can be static as shown above, like /fade/ongoing
That assumes that only one action of a given type can be invoked at a time. It's possible that a web thing could have multiple instances of the same action type running in parallel, or have multiple requests lined up in a queue to be executed sequentially. For example, you might want to instruct a robot arm to invoke a series of movements one after the other, or print a series of receipts on a thermal printer.
Given that hypermedia is an advanced use case and that we should not break existing Consumer implementations, the input and output in Action Affordance level correspond to the invokeaction. I propose to add three new vocabulary terms in the Action Affordance level, named query, update and cancel that are of Object type.
For completeness, it might make sense to add an invoke
object type as well, but continue to support the input
and output
of invoke at the top level of the Action
object for backwards compatibility.
The part of this proposal that I think will be the hardest to define in a specification is how a client keeps track of templated values between output
and form
objects. Can the value of any href
, input
or output
member of any affordance in a Thing Description contain a URI template? What meaning should a client attribute to those values?
Also, consideration needs to given to error conditions. How does the Thing Description describe the result of an action invocation, update or cancellation that fails?
I also like #907 more.
We can specify contentType
in the Form
for querying an Action (please include that in the examples), but can we specify a DataSchema
?
Updated on 10.06.2020 11:48 am CET
After discussions with @mkovatsc following the WISHI call of 08.06.2020 , below are his comments regarding the use of hypermedia in the context of W3C WoT. @mkovatsc if there is anything wrong or missing, feel free to edit this exact comment :blush:
My comment on this: If we start having quickly changing TDs, we can almost think of putting the last values of properties and in the TDs, which is not what we want.
We would need a way that is dynamic and based on the responses of the Thing (more specifically a specific media type) that is not necessarily fully described in a TD. The responses of the Thing would guide the Consumer and the TD should ensure that the Consumer can check beforehand that it will understand all the possible responses.
My comment on this: If there was a widespread hypermedia standard, we would not need TDs, the Consumer would be able to use an API from an initial endpoint and discover the API (also see HATEOAS).
CoRAL draft (https://tools.ietf.org/html/draft-ietf-core-coral-03) from IETF (@ektrah) is a proposal that is more aligned with "real" hypermedia. There, a specific media type i.e. application/coral+cbor
is used and a Consumer who can parse this, will be able to understand on how to use the Thing.
We can also explore how one can describe a state machine in a TD.
There is no widely accepted hypermedia standard. That means that we can prescribe how it should be done with TDs. We can somehow support the existing implementations by Oracle and Mozilla but we do not have to guide the greenfield on the fact that hypermedia should be done like this.
My comments on this: This would mean almost a separate task force that focuses on such a deliverable.
We would need a way that is dynamic and based on the responses of the Thing that is not necessarily fully described in a TD. The responses of the Thing would guide the Consumer.
And preferably the Consumer can parse it in a similar way it does a TD. Which brings to the idea of returning a control object that is parseable as a TD, i.e. homomorphic with a TD. That would be quite easy to specify based on the TD and just needs a different name than a Thing, for instance Process or something else.
And preferably the Consumer can parse it in a similar way it does a TD.
To me, it would be preferable to try to align with OpenAPI or CoRAL for generic hypermedia control... Or to reuse the hypermedia controls module of TDs. Things can return links and forms only. CoRAL describes form input as form fields, which is something we can add to Form
objects in the TD model.
(The main difference to your suggestion, @zolkis is that ActionAffordance
s still refer to physical actions and not to arbitrary REST operations on data.)
In 2020-06-12 telecon, it was suggested this thread might have reached a point where we need to discuss in F2F meeting for a decision. @mjkoster mentioned he also has a baseline implementation with hypermedia control.
And preferably the Consumer can parse it in a similar way it does a TD. Which brings to the idea of returning a control object that is parseable as a TD, i.e. homomorphic with a TD. That would be quite easy to specify based on the TD and just needs a different name than a Thing, for instance Process or something else.
This was exactly what I had in mind back then, to define "Action Description" based on the Thing Description spec -- basically the TD format with something like @type: Action
instead of Thing
.
However, CoRAL support should be developed in parallel (it would need some critical mass to establish a new, true hypermedia format. OpenAPI does not seem fit for hypermedia, unless they recently made a leap forward.
Discussed In TD teleconference on 2020-07-15 (see minutes).
Discussed in a TD session during virtual F2F meeting on 2020-10-21 (see minutes). It was suggested by @mlagally and others to further discuss this issue in WoT Profile calls.
Couldn't find a related issue in WoT Profiles so I am posting it here. After talking with the participants (@TaoXu00 and @dearzhaorui) of the BRAIN-IoT project (http://www.brain-iot.eu/) that Siemens is also part of, there is further use cases for this in the robotics field. Below is an extract of the TD that they use for describing the already existing endpoints of a robot made by Robotnik (https://robotnik.eu/):
{
"title":"robotnik",
"description":"Robotnik REST Implementation for Brain-Iot",
"actions":{
"PlaceAdd":{
"description":"Commands a robot to start place procedure",
"input":{...},
"output":{
"type":"object",
"properties":{
"state":{
"type":"object",
"properties":{
"current_state":{
"type":"string",
"enum":["queued","running","paused","finished","unknown"]
}
}
}
},
"forms":[...]
},
"PlaceCancel":{
"description":"Cancels the current place mission",
"input":{
"type":"object",
"properties":{
"header":{
"type":"object",
"properties":{
"id":{
"type":"string",
"description":"The ID of the place mission you want to cancel; -1 cancels last mission"
}
}
}
}
},
"output":{
"type":"object",
"properties":{
"state":{
// same as above
}
}
},
"forms":[...]
},
"PlaceQuery":{
"description":"Gets the state of a place mission",
"input":{
"type":"object",
"properties":{
"header":{
"type":"object",
"properties":{
"id":{
"type":"string",
"description":"The id of the place mission you want to get the query state; -1 gets the query state of the last mission"
}
}
}
}
},
"output":{
"type":"object",
"properties":{
"state":{
// same as above
}
}
},
"forms":[... ]
}
}
}
So basically managing the place
action is done by 3 different actions and no apparent link between them can be established with a standard TD. I think that the minimum work for this feature of TD is to create some sort of link relations (like rel
keyword) between different interaction affordances and leave it open how this can be done/implemented.
The TD model in its first version does not allow Things to expose dynamically created resources, such as resources giving the status of long-lasting actions or event subscription resources.
A proposal is available under
/proposals/hypermedia-control
. (The proposal is rather long so I put it in its own file instead of exposing it in the issue.)