w3c / wot-thing-description

Web of Things (WoT) Thing Description
http://w3c.github.io/wot-thing-description/
Other
131 stars 63 forks source link

How do you cancel or query the state of an action request? #302

Closed benfrancis closed 2 years ago

benfrancis commented 5 years ago

In Mozilla's Web Thing API, an action can be requested using an HTTP POST request on an Action resource to create an ActionRequest resource. The Action resource is essentially an action queue, consisting of multiple ActionRequest resources.

The response to the POST provides a unique URL for the ActionRequest resource, which can then have its status queried with a GET or be cancelled with a DELETE. A list of all current requests can be retrieved by a GET on the Action resource.

How would this API be described in a Thing Description following the current draft specification? Or is there another intended way to achieve these use cases?

benfrancis commented 5 years ago

To provide some additional context for people who don't want to read the Web Thing API specification...

To request an action, the Web Thing API uses a POST request, e.g.

POST https://mythingserver.com/things/lamp/actions/fade
Accept: application/json

{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    }
  }
}

Response:

201 Created

{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    },
    "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655"
    "status": "pending"
  }
}

You can get a list of current action requests.

Request:

GET /things/lamp/actions/fade
Accept: application/json

Response:

200 OK
[
  {
    "fade": {
      "input": {
        "level": 50,
        "duration": 2000
      },
      "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655",
      "timeRequested": "2017-01-25T15:01:35+00:00",
      "status": "pending"
    }
  },
  {
    "fade": {
      "input": {
        "level": 100,
        "duration": 2000
      },
      "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655",
      "timeRequested": "2017-01-24T11:02:45+00:00",
      "timeCompleted": "2017-01-24T11:02:46+00:00",
      "status": "completed"
    }
  }
]

You can get the status of an action request.

Request:

GET /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655
Accept: application/json

Response:

200 OK
{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    },
    "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655",
    "timeRequested": "2017-01-25T15:01:35+00:00",
    "status": "pending"
  }
}

You can cancel an action request.

Request: DELETE /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655

Response: 204 No Content

You can also get a list of action requests of all types with a GET request to an Actions resource (whose URL is provided by the top level links member).

Request:

GET /things/lamp/actions
Accept: application/json

Response:

200 OK
[
  {
    "fade": {
      "input": {
        "level": 50,
        "duration": 2000
      },
      "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655",
      "timeRequested": "2017-01-25T15:01:35+00:00",
      "status": "pending"
    }
  },
  {
    "reboot": {
      "href": "/things/lamp/actions/reboot/124e4568-f89b-22d3-a356-427656",
      "timeRequested": "2017-01-24T13:13:33+00:00",
      "timeCompleted": "2017-01-24T13:15:01+00:00",
      "status": "completed"
    }
  }
]

And for completeness you can also request an action on the top level Actions resource if you want to.

Request: POST https://mythingserver.com/things/lamp/actions/ Accept: application/json

{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    }
  }
}

Response:

201 Created

{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    },
    "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655"
    "status": "pending"
  }
}

How can all of this be expressed in a Thing Description, including the payload formats and possible error responses?

draggett commented 5 years ago

By contrast, in Arena, HTTPS POST takes the action input as the body of the request, and returns the action output as the body of the response. This follows the basic semantics for POST. If a developer wants a cancellable process that is initiated by an action, that can be layered on top of the core patterns of actions and events. You could return a process ID for the action that initiates the process, and provide another action to cancel an active process using the process ID. You can likewise define a progress event that passes the process ID together with status information. My take is thus that the core semantics for the Web of Things should be really simple and more complex models can be layered on top. This principle also applies to APIs for event logs and for querying a history of property updates in the case of telemetry streams.

benfrancis commented 5 years ago

If the argument for forms vs. links is for backwards compatibility with existing IoT APIs using declarative protocol bindings, then it should be possible to describe the API described above using a Thing Description alone. Describing Mozilla's Web Thing API should be a particularly easy example as it was developed in parallel with the Thing Description specification and its data model is already aligned.

The solution you describe requires changing the API itself, and doesn't explain how the client would know from the Thing Description that the ID returned by the action request can be used to cancel that request by using a different action. Can you provide an example Thing Description which provides the client with all of that information in a declarative protocol binding?

benfrancis commented 5 years ago

By contrast, in Arena, HTTPS POST takes the action input as the body of the request, and returns the action output

What happens if the action is a long running process which doesn't complete by the time the HTTP request times out? (This was the reason we created the action queue API, and is a difference between requesting an action and and simply setting a property)

draggett commented 5 years ago

Developers are responsible for documenting the purpose of properties, actions and events. Picking appropriate names would help, e.g. an action called cancelProcess with an input named processID.

draggett commented 5 years ago

The timeout for HTTP requests is often client dependent. We could standardise how to express an indication of the maximum expected duration of a long lived process as part of the metadata for an action. Alternatively, developers could use the processID design pattern described above.

benfrancis commented 5 years ago

Developers are responsible for documenting the purpose of properties, actions and events. Picking appropriate names would help, e.g. an action called cancelProcess with an input named processID.

This assumes the involvement of a human to interpret those names and write custom code for that specific web thing, which doesn't allow for ad-hoc interoperability.

The timeout for HTTP requests is often client dependent. We could standardise how to express an indication of the maximum expected duration of a long lived process as part of the metadata for an action. Alternatively, developers could use the processID design pattern described above.

The duration of an action may be user defined e.g. an action to fade a light from 0 to 100% brightness over the course of 1 hour.

Again, your proposed solution requires changing the API. But how would you get the current status of an action in this model? e.g. to find out if the action succeeded or failed.

draggett commented 5 years ago

I agree that when it comes to describing existing services using a standardised machine interpretable format, this gets increasingly complicated, and I question whether this complexity is justified. In the longer time frame we can and should encourage convergence in protocols and their usage across the Internet and this would render declarative protocol bindings a historical legacy that is no longer needed.

An alternative is to provide a platform ID that web of things clients can used to identify what protocols and usage patterns apply to a given thing. This avoids the need for complex representations in every TD.

benfrancis commented 5 years ago

The point I'm making here is that it's not just difficult to be backwards compatible with existing APIs, it may actually be impossible without significantly more complexity than is currently allowed for in the current specification.

Is anyone willing to make a stab at describing Mozilla's Web Thing REST API in a Thing Description alone? The issue described here is just one of several problems with trying to do that.

(I think we've already agreed that the Web Thing WebSocket API can not be described in a Thing Description and would require a separate WebSocket subprotocol specification.)

draggett commented 5 years ago

The duration of an action may be user defined e.g. an action to fade a light from 0 to 100% brightness over the course of 1 hour.

Yes, but the developer should have some understanding of what the maximum is likely to be before the process can be considered to have failed. That expectation could be given as a metadata property.

Again, your proposed solution requires changing the API.

Yes, but see my previous post that questions the long term commercial need for a complex declarative protocol binding standard.

But how would you get the current status of an action in this model? e.g. to find out if the action succeeded or failed.

You would listen to the status events. In addition, I would design the server and the application to be robust against a loss of network connectivity, the reboot of the client or server, etc.

mkovatsc commented 5 years ago

We have been discussing this for a long time -- since IG-only work -- and also came to good conclusion about this. The ideal way to this is using hypermedia where a running Action is represented as a Web resource that is dynamically created upon invokaction. This running Action itself can have Properties and Actions itself again. Maybe you remember the discussions we had around a potential application/wot+json media type for this.

We do have all required extension points in place for this: The output of an Action can be such an application/wot+json representation or potentially other hypermedia formats such as CoRAL.

To date, support for this is very limited in existing systems; hypermedia concepts are almost nil. Thus, we decided to focus on description of deployed systems for now and tackle this issue in the next charter period or in the IG first. The closest we have are custom "ticket responses" that each platform does in a different style. This must be solved by semantically describing the response content and leave it to the application. @draggett described this approach in his comment further up.

draggett commented 5 years ago

We have been discussing this for a long time -- since IG-only work -- and also came to good conclusion about this. The ideal way to this is using hypermedia where a running Action is represented as a Web resource that is dynamically created upon invokaction. This running Action itself can have Properties and Actions itself again. Maybe you remember the discussions we had around a potential application/wot+json media type for this.

My very old proposal was to support things as first class types. However, I agree that this something we can leave to future extensions given the subtleties involved.

mkovatsc commented 5 years ago

support things as first class types

Simply set the content type to application/td+json, done...

draggett commented 5 years ago

That works for limited cases, but isn't a general solution. Object oriented programming languages support objects as first class types, so having things as first class types is something that will be expected for the web of things. At the protocol level we can pass things using the URI for their thing description, or as you suggest, by passing the JSON-LD for the thing description, both are a form of reference to a thing. Interestingly, by passing the JSON-LD explicitly, this corresponds to giving the thing a blank node for its RDF identifier.

If the TD for a thing includes declarations of initial values, the platform should carry out the initialisation. If this involves a thing, the platform needs to retrieve the thing's TD, if not supplied in place, and initialise that thing. This can get a little complicated when you need to deal with forward references, and when the dependencies between things form cycles. I showed how to handle that over two years ago, proving that it is a tractable problem, just as it is for object oriented programming languages.

vcharpenay commented 4 years ago

Here is a proposal: in a future version of the TD model, we could at least standardize new operation types to cancel, query (and update?) an invoked action: something like cancelaction, queryaction, updateaction. Each operation type would not necessarily be used in a TD directly but it could be used as part of a Link header or some hypermedia-aware response payload to drive WoT consumers.

vcharpenay commented 4 years ago

Having said that, it is possible already with the current TD spec to specify operations on dynamically created resources. For that, you can define a generic action manage that declares forms on these resources (what you call ActionRequests in Mozilla's WebThings, @benfrancis) Here is a try:

{
    "@context": [
        "https://www.w3.org/2019/wot/td/v1",
        {
            "ActionRequest": "http://example.org/ActionRequest",
            "cancelaction": "http://example.org/cancelActionOperationType",
            "queryaction": "http://example.org/queryActionOperationType"
        }
    ],
    "id": "urn:example:mylamp",
    "actions": {
        "fade": {
            "input": {
                "type": "object",
                "properties": {
                    "level": { "type": "number" },
                    "duration": { "type": "duration" }
                }
            },
            "output": {
                "type": "object",
                "properties": {
                    "href": {
                        "@type": "ActionRequest",
                        "type": "string"
                    },
                    "status": {
                        "enum": [ "pending", "completed" ]
                    }
                }
            },
            "forms": [
                {
                    "href": "https://mythingserver.com/things/lamp/actions/fade",
                    "op": "invokeaction"
                }
            ]
        },
        "manage": {
            "uriVariables": {
                "actionRequest": {
                    "@type": "ActionRequest",
                    "type": "string"
                }
            },
            "forms": [
                {
                    "href": "{actionRequest}",
                    "htv:methodName": "DELETE",
                    "op": "cancelaction"
                },
                {
                    "href": "{actionRequest}",
                    "htv:methodName": "GET",
                    "op": "queryaction"
                }
            ]
        }
    }
}

In this example, you can see that I use cancelaction and queryaction but since they do not exist yet in the TD model, I declared them in the JSON-LD context. Same thing for the class ActionRequest, which indicates the output of action invokation is the same as the actionRequest URI variable.

vcharpenay commented 4 years ago

However, I would also expect (or wish) that future WoT Things have a more hypermedia-driven interface to consumers. In that case, the cancelaction and queryaction operations could be added to each array item in the response of GET /things/lamp/actions/fade.

As @mkovatsc and @draggett said, the same structure as in the TD model could be used for the JSON response. I would suggest one conceptual variant, though: to me, reserving the class Thing for physical objects is important. So, it means that if a new "TD" is returned after invoking an action, it should be interpreted as an extension of the original TD, as if new actions on the same Thing were made available. It makes a big difference when dealing with the semantics of TD documents but the JSON structure would not be significantly impacted.

egekorkan commented 4 years ago

I support the idea at the previous comment by @vcharpenay. In order to bring some more examples to the discussion, we have a PanTilt module (think it like the non-camera part of a CCTV camera) where a stopMovement action can stop any ongoing movements. The source code and TD can be found here. In addition to actions that take a long time, there can be actions that are started via a request but the physical action never stops. From the previous example, it would be the moveContinuously and panContinuously actions, where the invokeaction request starts the movement and the movement doesn't stop until it hits a limit or a stopMovement action is invoked. A more familiar example would be a conveyor belt that is started with an action and stopped with another action. A hypermedia based approach was the first one that came to mind but I was not sure how one would describe it, since execution of a form with an op invokeaction would need to return some information that is used by a form with another op value. I think the comment above goes in the right direction by taking this into account. Just that I think it would be better to not introduce another action and maybe pack the manage action into the fade action.

mlagally commented 4 years ago

As discussed in the TD call on 7.2. we are looking at different examples.

Oracle's IoT Cloud service has a hypermedia-based action model that supports synchronous and asynchronous operations.

The response payload contains a key "complete", when the operation is already finished, otherwise the url endpoint contains a link to asynchronously query the status.

{
"complete":false,
"id":"72a4239f1644-ccf",
"endpointId":"6248475d6e28-3013",
"url":"https://iotserver/iot/api/version/resource/path",
"method":"Request method",
"status":"Request statusOne of [RECEIVED, DISPATCHED, COMPLETED, EXPIRED, FAILED, UNKNOWN].",
"requestTime":"2016-07-22T10:44:57.746Z",
"responseTime":"Time when the response is received by server",
"responseEventTime":"2016-07-22T10:44:57.746Z", "responseStatusCode":"Request status code from the response message (One of [HTTP 200: OK, HTTP 201: Created, HTTP 202: Accepted, HTTP 203: Non Authoritative Information, HTTP 204: No Content, HTTP 400: Bad Request, HTTP 401: Unauthorized, HTTP 402: Payment Required, HTTP 403: Forbidden, HTTP 404: Not Found, HTTP 405: Method Not Allowed, HTTP 406: Not Acceptable, HTTP 408: Request Timeout, HTTP 409: Conflict, HTTP 500: Internal Server Error, HTTP 502: Bad Gateway, HTTP 503: Service Unavailable].)",
"response":"Original response message payload JSON document"
}

Here's the full API decumentation for Invoke action: https://docs.oracle.com/en/cloud/paas/iot-cloud/iotrq/op-iot-api-v2-apps-app-id-deviceapps-devapp-id-devicemodels-devicemodel-id-actions-action-name-post.html

Starting point for the API documentation: https://docs.oracle.com/en/cloud/paas/iot-cloud/iotrq/toc.htm

takuki commented 4 years ago

I would like to point out that Thing-Consumer protocol should always look forward, but not backward.

This means, it had not better depend on transaction model where you can "cancel" a request while it is in action.

Consumer should be able to make an independent "cancel" request to a Thing, and the Thing makes a best effort to fulfill the request. The fulfillment may be just stop the action, or Thing may wait the action to finish (if it cannot be stopped immediately) and revert to the original state if possible.

Here, note that a Thing may be able to process the cancel request even after the original request was complete. This is why I said Thing-Consumer protocol should always look forward. Things can decide how best to process the "cancel" request because it just one of the subsequent action requests.

mlagally commented 4 years ago

I like the proposal. There's one aspect to consider: Is the cancel operation synchronous or asynchronous? If it is asynchronous, would it be possible to abort a long-lasting cancel operation that does not complete?

zolkis commented 4 years ago

For an async operation the cancellation should also be async. Usually cancelling cannot be guaranteed, so it is always best effort. Therefore Things need to be designed

sebastiankb commented 4 years ago

Does it make sense to introduce a hypermedia-specific navigation term that gives an indication of where the resource is defined in the payload message that can be used to query the status or cancel an action? E.g., in the case of Oracle it would be

{

            "forms": [
                {
                    "href": "...",
                    "op": "invokeaction",
                    "hypermedia" : "url" //--> points to the JSON term of the response payload message
                }
            ]
        } 

for Mozilla it would look like

{

            "forms": [
                {
                    "href": "...",
                    "op": "invokeaction",
                    "hypermedia" : "href" //--> points to the JSON term of the response payload message
                }
            ]
        } 

Btw: I have checked the MDSP API, and there seems to be no use of the hypermedia approach yet.

benfrancis commented 4 years ago

@sebastiankb I don't understand how this would work. Could you provide a more complete example for "url" and "href".

sebastiankb commented 4 years ago

@benfrancis The idea is to provide a hint in the TD (in the forms container of actions) where the client can identify the name-urlValue pair in the response message, which can be used asynchronously to query action's status.

In my example above, the hint is given by the term hypermedia (maybe not the perfect name for it). The value 'url' (Oracle) or 'href' (Mozilla IoT) indicates the JSON name used in the corresponding response message. E.g., in the case of Oracle the client would identify the entry

"url":"https://iotserver/iot/api/version/resource/path"

which can then used to query the status.

I hope this is clearer now.

benfrancis commented 4 years ago

@sebastiankb Oh I see, yes that is clearer now thank you.

That tells the client how to find out the URL of an action request resource, but how would the client know what format to expect for that resource? I've provided an example flow below.

The client requests an action.

POST https://mythingserver.com/things/lamp/actions/fade
Accept: application/json

{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    }
  }
}

The server responds with the URL of the created action request resource.

201 Created

{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    },
    "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655"
    "status": "pending"
  }
}

So far so good. The client knows where to find the newly created action request resource.

The client requests the status of the action request.

GET /things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655
Accept: application/json

The server responds with its current status.

200 OK
{
  "fade": {
    "input": {
      "level": 50,
      "duration": 2000
    },
    "href": "/things/lamp/actions/fade/123e4567-e89b-12d3-a456-426655",
    "timeRequested": "2017-01-25T15:01:35+00:00",
    "status": "pending"
  }
}

How does the client know what format to expect from this response to determine the status of the action request, or how to modify or cancel the action request?

egekorkan commented 4 years ago

I guess there are different aspects to look at here:

  1. Describing the payload format of the response of the Thing based on different requests, i.e. invoking an action, querying the status, cancelling and modifying.

At the moment, we are able to describe only the payload of the response from invoking an action by using the output term. We can thus imagine adding new terms on the action affordance level like querystatus, cancellation, modification which all have input and output fields.

  1. Finding where the hypermedia information is. @sebastiankb suggests that it would be in the form of the interaction. @vcharpenay suggests that it could be also possible to indicate in one of the output properties. For example:
    {
    "output":{
    "type":"object",
    "properties":{
      "key1":{"type":"number"},
      "url":{
        "type":"string",
        "@type":"hypermedia" //or some other semantic annotation
      }
    }
    }
    }

    In @sebastiankb 's previous examples, "url" or "href" SHOULD be described in the DataSchema of the output of the action affordance anyways, so we can add a semantic annotation and not need to change the forms. Doing this with the forms would have the following disadvantages:

    • Repeating it for different forms
    • A Thing or Consumer script written in node-wot (or other possible Scripting API implementations?) do not have control over the forms.
takuki commented 4 years ago

During March 6th TD telecon, @sebastiankb was asked to invite @benfrancis to online F2F TD session on March 17th. See minutes. See also the draft schedule for the online F2F.

sebastiankb commented 4 years ago

done

@benfrancis I sent you an e-mail to the address I found on your personal website.

sebastiankb commented 4 years ago

as agreed in today's TD call we would like to test an approach in the next PlugFest. The approach will contain:

  1. Hint of the JSON Name how to find out the URL of an action request resource
  2. The format to expect from this response to determine the status of the action request

@vcharpenay will provide a proposal for it based on past discussions.

egekorkan commented 4 years ago

Long and windy road ahead!

After talking with @sebastiankb, @danielpeintner and @wiresio , I summarize our findings. If anything is missing, feel free to edit this comment or add comments below.

Requirements

  1. Represent and use dynamic information: The href/url shown in the examples above where the Consumer can query, cancel or modify the action is not static as the href we have in the current TD. It is important to note that it needs to be reused in another request.
  2. Describing everything in the TD without implicit knowledge: TD has always been conceived to eliminate out-of-band information. This means that we cannot do assumptions that a Thing follows a certain hypermedia protocol. The Consumer should be able to interact with the Thing by only its TD. Thus, we cannot assume that the Thing will always return hypermedia related information in a specific way.
    • Describing payloads for different operations: We should be able to describe the payload needed:
    • For a usual action invocation (current input)
    • For the result of the action that has nothing to do with hypermedia (current output)
    • For the response that describes hypermedia information such as the URL to query (dynamic), the status of the action, time created, etc. There can be two subtypes, one for describing the first response after the action invocation and one for describing the response to a query.
    • (not spoken with others) For querying the action status and deleting it. The above examples do not have this and ask for an HTTP GET and DELETE request to a URI
    • (not spoken with others) For changing/modifying the action. The above examples assume the same payload as the action invocation (current input).
    • Mapping the different requests to forms: Currently, every request a Consumer can build is seen in the forms fields with an associated operation (op). Different management/hypermedia related operations should have an op value and form.

We are not sure if there can be other requirements that need to be considered.

Example and Proposals

For the TD example below, think of a robot arm that is in a position of 50 degrees. An action can be invoked to rotate it for a given amount of time and speed. The Consumer should be able to invoke this action, query its status (still rotating or finished), change the speed of rotation, cancel the rotation and once the action finishes, the robot should tell its final position (output of the action).

{
...
  "actions": {
    "rotate": {
      "input": {
        "type": "object",
        "properties": {
          "url": {
            "@type": "ActionQueryInput",
            "type": "string"
          },
          "duration": {
            "type": "number",
            "@type": "ActionInvokeInput"
          },
          "speed": {
            "type": "number",
            "@type": ["ActionInvokeInput","ActionModifyInput"]
          }
        }
      },
      "output": {
        "type": "object",
        "properties": {
          "url": {
            "@type": "ActionQueryOutput", //["ActionQueryOutput","ActionQueryURI"]
            "type": "string"
          },
          "status": {
            "@type": "ActionQueryOutput",
            "type": "string",
            "enum": [
              "completed",
              "rotating"
            ]
          },
          "currentPosition": {
            "@type": "ActionResult",
            "type": "number"
          }
        }
      },
      "forms": [
        {
          "href": "https://myrobot.example.com/rotate",
          "htv:methodName": "DELETE",
          "op": "invokeaction"
        },
        {
          "href": "{ActionQueryInput}", //"{ActionQueryURI}"
          "htv:methodName": "DELETE",
          "op": "cancelaction"
        },
        {
          "href": "{ActionQueryInput}", //"{ActionQueryURI}"
          "htv:methodName": "GET",
          "op": "queryaction"
        },
        {
          "href": "{ActionQueryInput}",
          "htv:methodName": "PUT",
          "op": "modifyaction"
        }
      ]
    }
  }
...
}

Some considerations:

vcharpenay commented 4 years ago

the proposal of @egekorkan's last message doesn't seem far from what I refer to in #899. Shall we continue the discussion in that other thread? I'd like to have more details on the actual messages being sent to/by the robot when GETting and PUTting the action query resource.

takuki commented 4 years ago

In 5/22 TD telecon, it was noted we should consider pros and cons of both @egekorkan 's and @vcharpenay 's proposals and merge them together.

@egekorkan will first need to make his alternative proposal concrete in a separate document. After that the TD TF compare the two proposals side by side.

egekorkan commented 4 years ago

Also, I have the feeling that we are not looking at existing documents (not really standards) that talk about hypermedia. In the end, hypermedia is as old as REST and there is quite some material already:

benfrancis commented 3 years ago

Support for action queues is in the current charter and I'm conscious we haven't come up with a solution for this yet. This is also needed in order to make WebThings W3C compliant (see WebThingsIO/gateway#2806 and WebThingsIO/gateway#2807).

The closest I have seen to a solution to this problem is @egekorkan's Hypermedia Control 2 proposal:

  • introduce new operation types queryaction, updateaction, cancelaction
  • introduce new fields query, update and cancel to action affordances that map to payload information of queryaction, updateaction and cancelaction, respectively.
  • input and output to each previously introduced terms

Below is an example Thing Description which illustrates how this could work (combined from two examples in the proposal):

{
  "@context": "https://www.w3.org/2019/wot/td/v1",
  "id": "urn:ex:thing",
  "actions": {
    "fade": {
      "input": {
        "type": "number",
        "description": "duration in ms"
      },
      "output":{
        "type":"object",
        "properties":{
          "href":{
            "const":"{id}",
            "description": "URI to query, update or cancel the invoked action"
          },
          "status":{
              "type":"string",
              "enum":["ongoing","finished","pending"],
              "description": "status of the invoked action"
            }
         }
      },
      "query":{
        "output":{
          "type":"object",
          "properties":{
            "brightness":{
              "type":"number",
              "description": "current brightness"
            },
            "status":{
              "type":"string",
              "enum":["ongoing","finished","pending"],
              "description": "status of the invoked action"
            }
          }
        }
      },
      "update":{
        "input": {
          "type": "number",
          "description": "ADDED duration in ms"
        }
      },
      "cancel":{
      },
      "forms": [
        {
          "href": "/fade",
          "op": "invokeaction",
          "htv:methodName": "POST",
          "contentType":"application/json"
        },
        {
          "href": "/fade/{id}",
          "op": "queryaction",
          "htv:methodName": "GET",
          "contentType":"application/json"
        },
        {
          "href": "/fade/{id}",
          "op": "updateaction",
          "htv:methodName": "PUT",
          "contentType":"application/json"
        },
        {
          "href": "/fade/{id}",
          "op": "cancelaction",
          "htv:methodName": "DELETE",
          "contentType":"application/json"
        }
      ]
    }
  }
}

One remaining issue I see with this proposal is how consumers will know to map the {id} from the output of invokeaction to the {id} in the href of forms of queryaction, updateaction and cancelaction.

One approach might be to add semantic annotations to output and uriVariables which assign semantic meaning to these values so consumers know they have a special meaning. E.g.

{
  "@context": "https://www.w3.org/2019/wot/td/v1",
  "id": "urn:ex:thing",
  "actions": {
    "fade": {
      "input": {
        "type": "number",
        "description": "duration in ms"
      },
      "output":{
        "type":"object",
        "properties":{
          "href":{
            "const":"{id}",
            "@type": "ActionRequestID",
            "type": "string",
            "description": "URI to query, update or cancel the invoked action"
          },
          "status":{
              "type":"string",
              "enum":["ongoing","finished","pending"],
              "description": "status of the invoked action"
            }
         }
      },
      "query":{
        "output":{
          "type":"object",
          "properties":{
            "brightness":{
              "type":"number",
              "description": "current brightness"
            },
            "status":{
              "type":"string",
              "enum":["ongoing","finished","pending"],
              "description": "status of the invoked action"
            }
          }
        }
      },
      "update":{
        "input": {
          "type": "number",
          "description": "ADDED duration in ms"
        }
      },
      "cancel":{
      },
      "forms": [
        {
          "href": "/fade",
          "op": "invokeaction",
          "htv:methodName": "POST",
          "contentType":"application/json"
        },
        {
          "href": "/fade/{id}",
          "op": "queryaction",
          "htv:methodName": "GET",
          "contentType":"application/json"
        },
        {
          "href": "/fade/{id}",
          "op": "updateaction",
          "htv:methodName": "PUT",
          "contentType":"application/json"
        },
        {
          "href": "/fade/{id}",
          "op": "cancelaction",
          "htv:methodName": "DELETE",
          "contentType":"application/json"
        }
      ],
      "uriVariables": {
        "id": {
          "@type": "ActionRequestID",
          "type": "string",
          "description": "URI to query, update or cancel the invoked action"
        },
    }
  }
}

Note: I'm not 100% sure of the intended meaning of the const keyword from JSON Schema. @egekorkan Can you explain?

What do people think about this solution? I'd like to understand whether this is likely to make WoT Thing Description 1.1 so we know whether we need to drop the action queue feature from all 17 implementations in WebThings in order to be W3C compliant.

egekorkan commented 3 years ago

Note: I'm not 100% sure of the intended meaning of the const keyword from JSON Schema. @egekorkan Can you explain?

So const is just an enum with a single value. So if the id returned was always of value "myId123" then we could have "const":"myId123" but since it changes based on request, it has the {id} placeholder instead. If we rely on "@type": "ActionRequestID" we won't need such a construct and the Consumer should know that the string has a special meaning.

sebastiankb commented 3 years ago

We should evaluate if we can also use the planed additionalSchemas for this approach. That means we do not need an additional query term. If I'm correctly we have only to introduce 3 new operation types (cancelaction, updateaction, queryaction), right?

egekorkan commented 3 years ago

I think that the additionalSchemas would be good. However, I think we should prescribe the keys in those schemas, i.e. additionalRes_I should not be allowed and it should be query.

benfrancis commented 3 years ago

See https://github.com/w3c/wot-profile/issues/81#issuecomment-880619349 for a proposal of how this could work in the Core Profile.

benfrancis commented 3 years ago

Below is an attempt to write a Thing Description which describes the Core Profile Protocol Binding for asynchronous actions proposed in https://github.com/w3c/wot-profile/pull/89, using the schemaDefinitions feature discussed in https://github.com/w3c/wot-thing-description/issues/1053.

{
  "@context": "https://www.w3.org/2019/wot/td/v1",
  "id": "urn:ex:thing",
  "actions": {
    "fade": {
      "input": {
        "type": "object",
        "properties": {
          "level": {
            "type": "integer",
            "minimum": 0,
            "maximum": 100
          },
          "duration": {
            "type": "integer",
            "minimum": 0,
            "unit": "milliseconds"
          }
        }
      },
      "output": {},
      "schemaDefinitions": {
        "actionStatus": {
          "output": {},
          "status": {
            "type": "string",
            "enum": [ "pending", "running", "completed", "failed" ]
          },
          "error": {
            "type": "object"
          }
        }
      },
      "forms": [
        {
          "href": "/fade",
          "op": "invokeaction",
          "htv:methodName": "POST",
          "contentType":"application/json",
          "response": {
            "htv:headers": [
              {
                "htv:fieldName": "Location",
                "htv:fieldValue": "/fade/{id}"
              }
            ]
          }
        },
        {
          "href": "/fade/{id}",
          "op": "queryaction",
          "htv:methodName": "GET",
          "response": {
            "contentType":"application/json",
            "schema": "actionStatus"
          }
        },
        {
          "href": "/fade/{id}",
          "op": "cancelaction",
          "htv:methodName": "DELETE",
          "contentType":"application/json"
        }
      ],
      "uriVariables": {
        "id": {
          "type": "string",
          "description": "identifier of action request"
        },
    }
  }
}

Notes:

benfrancis commented 3 years ago

On the Thing Description call today we discussed proposed invokeanyaction/queryallactions operations.

In that issue I noted that there are three potential use cases for "querying" an action:

  1. Getting an individual ActionStatus resource regarding an individual action request (e.g. GET /actions/fade/1935-5939-ngu3)
  2. Getting a list of pending action requests for a given action (e.g. GET /actions/fade)
  3. Getting a list of pending action requests for all actions (e.g. GET /actions)

Do we need two operations for Action affordances which distinguish between the first two? E.g. queryaction vs. queryactionrequest?

sebastiankb commented 3 years ago

Do we need two operations for Action affordances which distinguish between the first two? E.g. queryaction vs. queryactionrequest?

I think, this is a similar analogy to readproperty and readallproperties. In this context it would make sense to have two. Maybe we should use the term queryallactions instead. Option 2 and 3 can be supported by a Thing implementation and be announced at the top level forms:

  "forms": [
    {
      "op": ["queryallactions"],
      "href": "./actions/{ACTION_NAME}"
    },
    {
      "op": ["queryallactions"],
      "href": "./actions"
    }
  ]
benfrancis commented 3 years ago

@sebastiankb wrote:

I think, this is a similar analogy to readproperty and readallproperties.

I agree in that no. 2 is like readproperty and no. 3 is like readallproperties, but if we were following that example then no. 2 should be in the Action affordance, not a top level form. There's is no equivalent of no. 1 for properties because a Property only has one value, whereas an Action may have multiple instances.

  "forms": [
    {
      "op": ["queryallactions"],
      "href": "./actions/{ACTION_NAME}"
    },
    {
      "op": ["queryallactions"],
      "href": "./actions"
    }
  ]

I agree the same name makes sense for both operations, but it may be tricky to define how a Consumer distinguishes between the two if they share the same name.

I wish there was a word in the English language for an instance of an action, but I can't think of one. Some other ideas...

1.

  1. queryaction - in a Form in the Action affordance
  2. listactions - in a Form in the Action affordance
  3. listallactions - in a top level Form
  4. queryactionstatus - in a Form in the Action affordance
  5. queryaction - in a Form in the Action affordance
  6. queryallactions - in a top level Form
  7. queryaction - in a Form in the Action affordance
  8. readactionqueue - in a Form in the Action affordance
  9. readallactionqueues - in a top level Form
sebastiankb commented 3 years ago

I agree in that no. 2 is like readproperty and no. 3 is like readallproperties, but if we were following that example then no. 2 should be in the Action affordance, not a top level form.

Yes, thats makes sense.One idea is to design the top-level form to inform the client that it can query actions with a filter by specifying the name of the actions in the URL, which will return only the status of all active actions with the corresponding action name.

I agree the same name makes sense for both operations, but it may be tricky to define how a Consumer distinguishes between the two if they share the same name.

If we introduce the convention then the client can distinguish based on the URL, right?

I wish there was a word in the English language for an instance of an action, but I can't think of one. Some other ideas...

I would prefer no. II.

sebastiankb commented 3 years ago

regarding @benfrancis comment I will put this to the agenda of today's TD call.

egekorkan commented 3 years ago

Just one argument regarding having readaction based verbs for the op: What if in the future we see that there is also a use case for observing an action where the Consumer gets the changes to the state of the action? It might be good to make it aligned with properties.

An opposing argument based on the same "worry" I have: We should make sure that op keywords are different enough that a newcomer does not confuse actions with properties.

Yet another comment: Reading a property and querying an action are semantically very close. One can say that invoking an action creates a property affordance that is simply temporary, thus having readaction make sense

benfrancis commented 3 years ago

Note that the example Thing Description in https://github.com/w3c/wot-thing-description/issues/302#issuecomment-884867648 is now out of date. Following a review of the proposed action protocol binding for the Core Profile, both the synchronous and asynchronous responses follow the same data schema, which has been expanded to include a hrefmember. I've tried to provide an updated example Thing Description below which covers both cases, but it's not easy.

{
  "@context": "https://www.w3.org/2019/wot/td/v1",
  "id": "urn:ex:thing",
  "actions": {
    "fade": {
      "input": {
        "type": "object",
        "properties": {
          "level": {
            "type": "integer",
            "minimum": 0,
            "maximum": 100
          },
          "duration": {
            "type": "integer",
            "minimum": 0,
            "unit": "milliseconds"
          }
        }
      },
      "output": {},
      "schemaDefinitions": {
        "actionStatus": {
          "status": {
            "type": "string",
            "enum": [ "pending", "running", "completed", "failed" ],
            "required": true
          },
          "output": {
            "required": false
          },
          "error": {
            "type": "object",
            "required": false
          },
          "href": {
            "type": "string",
            "const": "/fade/{id}",
            "required": false
          }
        }
      },
      "forms": [
        {
          "href": "/fade",
          "op": "invokeaction",
          "htv:methodName": "POST",
          "contentType":"application/json",
          "response": {
            "contentType": "application/json",
            "schema": "actionStatus"
          },
          "additionalResponses": {
            "success": "yes",
            "contentType": "application/json",
            "schema": "actionStatus",
            "htv:headers": [ 
              {
                "htv:fieldName": "Location",
                "htv:fieldValue": "/fade/{id}"
              }
            ]
          }
        },
        {
          "href": "/fade/{id}",
          "op": "queryaction",
          "contentType":"application/json",
          "htv:methodName": "GET",
          "response": {
            "contentType":"application/json",
            "schema": "actionStatus"
          }
        },
        {
          "href": "/fade/{id}",
          "op": "cancelaction",
          "htv:methodName": "DELETE"
        }
      ],
      "uriVariables": {
        "id": {
          "type": "string",
          "description": "identifier of action request"
        }
    }
  }
}

The notes from above still apply:

Notes:

  • The response to the invokeaction operation does not follow the output data schema because an asynchronous response to an action invocation does not include the output of the action. Rather, the output schema is used as part of the actionStatus data schema in the follow-up queryaction operation. Separating the output data schema from the response data schema is one of the topics discussed in https://github.com/w3c/wot-thing-description/issues/1053
  • I've included an empty output schema as placeholder since in this particular example the action has no output. But where an action does have an output, I'm not sure of the most appropriate way to link the output schema from the actionStatus schema. Is a JSON pointer appropriate here?
  • The schema member currently only seems to be allowed in an AdditionalResponse (added in https://github.com/w3c/wot-thing-description/commit/100f0de1d8d608e7c3c3420b1c4e5f68a5afa628), not an ExpectedResponse. That would need changing.
  • Is it sufficiently obvious to consumers that the {id} in the Location header of the response to the invokeaction request corresponds to the {id} used in the href of other operations? I can't think of a way semantic annotations would help in this case.
  • Is it OK to use URL templates in the Location header?
  • This TD doesn't currently describe error conditions. Is there a way to specify the status code of an ExpectedResponse and an AdditionalResponse? OpenAPI does this by keying responses by status code, but I think the decision was not to do that for additionalResponses since it would be too protocol specific. I can't find vocabulary in the Protocol Binding Templates specification to describe an HTTP status code.

In addition to these notes:

Overall my impression is that it would be very difficult for a Consumer which didn't explicitly implement the Core Profile Protocol Binding to interpret this Thing Description, but this is the closest I can get to providing a declarative equivalent of the concrete protocol binding described in the specification. Note that my intention is that a Web Thing using the Core Profile would expose a much simpler Thing Description than this, this is just a canonical(ish) example of what it might look like once all the defaults defined in the Core Profile Protocol Binding have been applied, and how the full protocol binding would have to be described for a Consumer which doesn't implement the Core Profile.

I think the important action item here is to decide whether to add the queryaction and cancelaction operation names to the Thing Description specification, and what their meta-interaction equivalents in top level forms might be called.

benfrancis commented 3 years ago

Note that my intention is that a Web Thing using the Core Profile would expose a much simpler Thing Description than this

E.g.

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "id": "urn:ex:thing",
    "actions": {
      "fade": {
        "input": {
            "type": "object",
            "properties": {
               "level": {
                  "type": "integer",
                  "minimum": 0,
                  "maximum": 100
                },
                "duration": {
                  "type": "integer",
                  "minimum": 0,
                  "unit": "milliseconds"
                }
            }
        },
        "output": {},
        "forms": [
            {
              "href": "/fade",
              "op": "invokeaction"
            },
            {
              "href": "/fade/{id}",
              "op": "queryaction"
            },
            {
              "href": "/fade/{id}",
              "op": "cancelaction"
            }
        ],
        "uriVariables": {
            "id": {
            "type": "string",
            "description": "identifier of action request"
            }
        }
    }
}
benfrancis commented 3 years ago

@egekorkan wrote:

Reading a property and querying an action are semantically very close. One can say that invoking an action creates a property affordance that is simply temporary, thus having readaction make sense.

If an operation using an HTTP request like GET /actions/fade/19g3-631g-61gj was called readaction, then what would an operation like GET /actions/fade or GET /actions be called?

I think the key difference between properties and actions is that a property only has one value at any one time, whereas an action may have multiple running instances (in serial or in parallel). So whilst a property is likely to be bound to a single resource (hence the singular terms readproperty/writeproperty) an action may be bound to a collection of resources (i.e. an action queue).

I think we basically need to decide whether the term "action" in operation names refers to:

A) the collection, e.g.

B) an individual instance of the interaction, e.g.

Which works best?

egekorkan commented 3 years ago

Not sure if I should comment here or at #1208 but I think that there are some problems when one thinks of the Consumer applications in cases that href has dynamic ids. Please also have a look at https://github.com/w3c/wot-thing-description/tree/main/proposals/hypermedia-control-2#observations-1 .

An important thing to highlight here is that for many devices there would be no real need to have dynamic ids if we do not want to queue multiple actions. If I am fading a lamp, rotating a robot, sprinkling water on a farm, my Thing can reject subsequent invoke actions if one is already being processed. Dynamic hrefs is more difficult to implement in a Thing and in Consumers so I would not want to promote their use in the TD specification. They should be of course possible to describe and they are needed for the WebThings API as well. Ideally, we should use static hrefs in most examples and then a separate section about how to managed dynamic hrefs in TDs.