'Processes' Section Options

jeffharrison commented 3 years ago

Having a Profile of Processes as a second API in the current document may be confusing to the geospatial community. The Routes SWG recommends on 23 Feb 2021 that a path forward may be one of the following options...

1 Keep it as a separate, optional Conformance Class of Part 1 of the OGC API - Routes

2 Move it to a new Part 2 of OGC API - Routes

3 Develop a separate document on how to implement Routes with OGC API - Processes

Please provide your comments and feedback on this Issue to inform the SWG.

cportele commented 3 years ago

Note that the idea with options 1 and 2 would be to not duplicate all requirements classes with equivalent classes in the Processes profile.

It is unclear, if that can be done.

jerstlouis commented 3 years ago

@cportele If the execute request can be identical, then that is easily done as the only extra requirement is to offer an OGC process description document.

If the execute request still differs, then the idea would be that conformance classes indicate the functionality supported by the routing API (e.g. height, load restriction, obstacles, etc.), while one specific conformance class exists for each of the stand-alone routing API, and one for the Processes profile API, implying that the same routing functionality/parameters is available for both APIs.

jerstlouis commented 3 years ago

cc'ing @bpross-52n as 52 North implemented the Processes profile API for routing

jeffharrison commented 3 years ago

I think having two APIs in one standards document is fundamentally awkward. Leaving the technical complexity aside, imagine referencing such a document in procurement language.

jerstlouis commented 3 years ago

@jeffharrison I think there is a broader discussion to have about the OGC API standards, conformance classes, compliance and procurement, and specific requirements for one, or more than one, specific conformance classes.

As I mentioned during the call, ideally, you would only have a single Routing API, and an extra conformance class which would simply allow generic Processes clients to make use of that API, by requiring a description of the routing process to be provided using the OGC Process description document, i.e. something like this.

Both approaches already return the same Route Exchange Model. The routing execution request documents are already much closer after recent improvements to OGC API - Processes. All that remains is whether we can fully align the stand-alone routing request with the now simplified execute request for Processes.

The advantage of doing so is that then we would really have a single interoperable API for both approaches. The disadvantage is having a little bit more boiler-plate in the request (mainly "inputs" : { } at the top, if we consider that other things like "mode" and "outputs" could eventually be made optional in Processes).

There is already discussion to make mode optional here. And already Part 3: Workflows already considers making "outputs" optional if the output is selected in another way or if there is a single output.

jeffharrison commented 3 years ago

I think a broader architectural question for OGC API standards here is... If OGC adds Processes to an API focused on Route resources, should this approach be applied consistently across other OGC APIs?

For example, getting Maps is likely a Process. Would Processes conformance classes be added to Maps API? In fact, one could make the argument that almost any resource which is not static is the result of a Process.

I think Options 2 or 3 provide a simpler, more modular path forward.

cportele commented 3 years ago

I agree that options 2 and 3 would be clearer.

The shorter that additional specification of the Processes profile would be the better.

jerstlouis commented 3 years ago

@jeffharrison I would make the counter argument: Should OGC define a new standard for every single process that may be defined in multiple implementations? Do we need an OGC API - NDVI, OGC API - Geometry Buffer, OGC API - Geometry Merger, and so on?

Could other OGC API specifications like Maps also define a conformance class for a profile of Processes, and should there be a recommendation that they do so when that makes sense and provides value -- I think so, and I proposed that here for Maps (we have a RenderMap process here).

But I think that discussion is separate from the three options here -- any of them addresses the need to define a profile of Processes.

I have no major objection to option 2 or 3, but my preference would be 1, with the idea that it would be a simple requirement to provide a process description for the API and the API would otherwise be one and the same. Implementers would then easily be aware of this possibliity, and they could decide whether they want to implement this for the additional interoperability with generic Processes clients.

With OGC API - Processes - Part 3: Workflows, we can also combine the Map process with the Routing process to render a map with a route displayed on top (POST the following JSON here, or paste it in the execute request form to see this in action. Note that we would need to add styling options here for the route to stand out more, right now it's only a 1 pixel-thin black line.)

{
   "id" : "MapAndRoute",
   "process" : "https://maps.ecere.com/ogcapi/processes/RenderMap",
   "inputs" : [
      { "id" : "transparent", "value" : false },
      { "id" : "background", "value" : "0xC0C0C0" },
      {
         "id" : "layers",
         "list" : [
            { "collection" : "https://maps.ecere.com/ogcapi/collections/osm:dc" },
            {
              "id" : "computedRoute",
              "process" : "https://maps.ecere.com/ogcapi/processes/OSMERE",
              "inputs" : [
                {
                  "id" : "waypoints",
                  "value" :
                  {
                    "type" : "MultiPoint",
                    "coordinates" : [
                      [ -77.047712, 38.892346 ],
                      [ -76.99473, 38.902629 ]
                    ]
                  }
                },
                { "id" : "dataset", "collection" : "https://maps.ecere.com/ogcapi/collections/osm:dc:roads" }
              ]
            }
         ]
      }
   ]
}

jeffharrison commented 3 years ago

In addition to the potential challenges with procurement language and the issue with many functions in OGC being a potential process, it seems adding Processes to the OGC API - Routes Part 1 adds complexity.

Large and small companies have implemented the basic Routes API and commented on its simplicity and ease of implementation.

I have no problem with either Option 2 or 3 though. Would be a great path forward for Routes SWG.

jeffharrison commented 3 years ago

Thanks Jerome. Great example! I would fully support such capabilities as Part 2 of Routes API. Lots of potential.

skyNacho commented 3 years ago

I agree with the outcome that this discussion is pointing to: option 2 or 3.

I clearly understand the potential of integrating routes in OGC API - Routes, but I do not think using the same API for two clearly separate use cases and user profiles is the way to go. The devil is in the details and although it might seem a simple thing to do, I am 100% convinced that we would find important difficulties as soon as we tried that approach. Just one simple example that came to my mind as I read through this discussion: "mode" in Processes refers to sync or async execution mode (mostly), but in Routing it usually refers to transportation mode (vehicle, foot, transit, ...). Other routing APIs implement "mode" as an attribute, so confusion would be guaranteed just with this minor aspect.

On the other hand, I believe the link between both use cases / user profiles is in having a common data structure, the Route Exchange Model. I think this is the transversal interoperable part of the Routes standard that can be used indistinctly by any API involved in routing.

jerstlouis commented 3 years ago

@skyNacho Although it might appear to be a potential source of confusion, there isn't really a conflict between a mode input for the process (which would be inside the inputs object) vs. the mode parameter at the top-level of the OGC API - Processes execute request, and users of a generic Processes client would likely not even see that top-level mode property. I also believe that we need to harmonize more than only the the Route Exchange Model, which only deals with the results of the route calculation, but says nothing about the routing request.

That being said, it seems so far that I alone is favoring option 1, and the difference with option 2 and 3 is purely about how the documents are organized, with implications relating to the compliance programs, but no difference at the technical level, so I would agree that we are leaning towards option 2 or 3 :)

jeffharrison commented 3 years ago

Jeff H made a motion to transfer Processes API material to a Part 2 of OGC API - Routes.

Nacho seconded motion.

Discussion - Jerome has made effort to make ProcessesExecute request as light as possible. This method is similar to current Routing API methods. OGC Routes SWG should still attempt to line up approaches. Jeff H indicated there is potential in this, and it should be done under Part 2 of OGC API - Routes and then SWF will assess impacts to Part 1.

Jerome will provide a written description of ProcessesExecute as a new Issue for Routes SWG.

There was NOTUC to the Motion. The motion passes.

jerstlouis commented 3 years ago

The new issue discussing the harmonization of the stand-alone and OGC API - Processes-based routing API is: #17.

cportele commented 3 years ago

In our last call I took the action to review latest the Processes draft with respect to this issue.

For the content of the request to compute a route see https://github.com/opengeospatial/ogcapi-routes/issues/17#issuecomment-817743788. It looks as if it would be possible to align the content, but there would be one additional change in Processes.

Looking at the bigger picture, I also did a comparison of the resources and operations in Routes and Processes:

GET /routes is similar to GET /jobs and then filtering all the Route jobs. However, Routes returns links to the route results and Processes returns status objects.
POST /routes is similar to POST ~~/jobs~~ /processes/{processId}/execution, but Routes requires the route definition (see the discussion in #17) while Processes requires id, outputs, mode and response members. Those members would be ignored in a Routes implementation. This works in general, because they are irrelevant for the Routes API building blocks, except for mode. I see the following issues:
- It would make sense to select the mode (sync, async, auto?) consistently across OGC API standards. Processes uses the mode member, Routes uses a query parameter. I think both approach are not what we should be doing. My proposal would be to use the Prefer header from RFC 7240 and the "respond-async" preference. The Routes resource would decide whether to return the route directly in the response (status 200) or if it will process the request asynchronously (status 202). If a "respond-async" preference is stated the API SHOULD honor that preference.
- Processes clients that submit an execute request with response=document to a Routes endpoint will be surprised by the result as it behaves like response=raw.
- An async response from a Routes endpoint will be surprising for a Processes client, as the referenced resource is a Route and not a Job Status.
- The subscriber approach is similar, but different. It should be possible to harmonize this, but that requires an active cross-SWG discussion. In general, I would say that the subscriber information should not be part of the content, but it should be separate, so that we can reuse this more easily. One option could be to include links to subscriber URIs in the request (header) with new link relation types.
GET /routes/{id} returns a GeoJSON Feature Collection (which is essential for the Routes API), but as far I can see, GET /jobs/{id}/result cannot return a GeoJSON Feature Collection, because there is always a mandatory wrapper.
- Routes supports the resultSet parameter, Processes does not support this and would always return the complete representation.
GET /routes/{id}/definition returns the Route Definition, Processes does not have a mechanism to retrieve the inputs of a job.
DELETE /routes/{id} returns a 204, but DELETE /jobs/{id} returns a 200.

I think the Processes profile option requires quite a bit of additional explanation, so I still think that option 2 or 3 should be the way to go.

jerstlouis commented 3 years ago

Just a clarification that per the latest changes the POST to execute a process has been moved from /jobs to /processes/{processId}/execution.

cportele commented 3 years ago

See https://github.com/opengeospatial/ogcapi-processes/issues/127

cportele commented 3 years ago

Meeting 2021-04-13:

Only the consistency between POST /routes and POST /processes/{processId}/execution is important.
@cportele to create a PR to support the Prefer header and updated approach for subscribers. (@pvretano will create a similar one for processes.)
DELETE should allow both 204 and 200 (but in practice 204 is sufficient).

gfenoy commented 3 years ago

In our last call I took the action to review latest the Processes draft with respect to this issue.

For the content of the request to compute a see #17 (comment). It looks as if it would be possible to align the content, but there would be one additional change in Processes.

Looking at the bigger picture, I also did a comparison of the resources and operations in Routes and Processes:

GET /routes is similar to GET /jobs and then filtering all the Route jobs. However, Routes returns links to the route results and Processes returns status objects.

Note that in previous version of OGC API - Processes there was a /processes/{processId}/jobs applying the filtering you are referring to. One would need to set routes as {processId} to get the desired result, an example using HelloPy available here (old version).

POST /routes is similar to POST ~/jobs~ /processes/{processId}/execution, but Routes requires the route definition (see the discussion in #17) while Processes requires id, outputs, mode and response members. Those members would be ignored in a Routes implementation. This works in general, because they are irrelevant for the Routes API building blocks, except for mode. I see the following issues:

Actually, some changes has been made lately implying removal of the id parameter that was replaced by the use of {processId} path parameter. Reducing the content of the execute request body to a minimum should be our target.

It would make sense to select the mode (sync, async, auto?) consistently across OGC API standards. Processes uses the mode member, Routes uses a query parameter. I think both approach are not what we should be doing. My proposal would be to use the Prefer header from RFC 7240 and the "respond-async" preference. The Routes resource would decide whether to return the route directly in the response (status 200) or if it will process the request asynchronously (status 202). If a "respond-async" preference is stated the API SHOULD honor that preference.

I am very supportive for your proposal to use the Prefer header, I had mentioned this RFC 7240 here and back in that time the ZOO-Project was supporting this header. To me, removing everything we can from the request body that have to be sent to the server will be beneficial for the OGC API - Processes. So, here I would prefer supporting this Prefer header parameter than anything within the request body.

Processes clients that submit an execute request with response=document to a Routes endpoint will be surprised by the result as it behaves like response=raw.

As I was referring back in the time about the same RFC again here I would have preferred using the Prefer header which can take the following value:

return=representation that can be used as an equivalent for "reponse": "document"
return=minimal that can be used as an equivalent for "reponse": "raw"

In case, we choose to accept both Prefer options, then we can mix Prefer: respond-async, return=representation to set "mode": "async" and "response": "document". I think it is more elegant and accessorily if makes it easier to be set from the swagger-ui that will provide relevant select list for this option (sorry I cannot find a setup old enough to show the example of this but we got this working also).

An async response from a Routes endpoint will be surprising for a Processes client, as the referenced resource is a Route and not a Job Status.

I am not sure of what you mean by "An async response" here? Usually, when running async execute requests there is no content in the server answer and a 201 (Created) status code and a Location header is provided to redirect to statusInfo in case of success.

The subscriber approach is similar, but different. It should be possible to harmonize this, but that requires an active cross-SWG discussion. In general, I would say that the subscriber information should not be part of the content, but it should be separate, so that we can reuse this more easily. One option could be to include links to subscriber URIs in the request (header) with new link relation types.

One more time, to me everything that can be bring out content from the request body would benefit the OGC API - Processes, by lowering the complexity to produce a request body.

I would be very happy to take part of the discussion.

GET /routes/{id} returns a GeoJSON Feature Collection (which is essential for the Routes API), but as far I can see, GET /jobs/{id}/result cannot return a GeoJSON Feature Collection, because there is always a mandatory wrapper.

One more time, I would use RFC 7240 and based on the Prefer header you can then be able to get the mandatory wrapper or only the result (by using the return=minimal option). In consequence, /routes/{id} from Routes API would become equivalent to /processes/{processId}/result with Prefer: return=minimal from OGC API - Processes. In case we want the wrapper, we may then not pass any Prefer header or ask to set it to return= representation.

Just a quick note that with the current OGC API - Processes, you can get the raw representation by using the "response": "raw"even for async execute (i.e. a simple string curl "http://tb17.geolabs.fr:8081/ogc-api/jobs/eb1a1f9a-9c6b-11eb-86b4-0242ac170006/results" -v -L, it may be a json file also, so would work with a GeoJSON FeatureCollection too).

Routes supports the resultSet parameter, Processes does not support this and would always return the complete representation.

It reminds me that at some point in time we were discussing the possibility of filtering multiple results from a result set, for services producing more than one output, then you may access an individual result using the following path /jobs/{jobId}/result/{inputId}. I don't know if it would work for the Routes API.

In addition, if we choose the Prefer header way, then we may imagine both access to raw data and response document (wrapper) using the same path.

GET /routes/{id}/definition returns the Route Definition, Processes does not have a mechanism to retrieve the inputs of a job.

In WPS 1.0.0, the lineage parameter attached to the ResponseDocument node was available exactly for this purpose. So, adding a path /jobs/{jobId}/definition looks perfectly reasonable.

I still have one question about bringing back lineage to life, should we return the corresponding json schema describing the input in addition to the "value" or only the input "value" (I use quotation mark around the value word because it can be string, integer, boolean, double, array or object). In case we provide only the values, does it mean that the server simply return the original request Body?

DELETE /routes/{id} returns a 204, but DELETE /jobs/{id} returns a 200.

In OGC API - Processes, when you use the DELETE method on /jobs/{jobId} the server is returning a statusInfo informing the client application that the deletion was successful, this is the reason why it return 200 status code rather than 204 which sounds reasonable.

cportele commented 3 years ago

@cportele to create a PR to support the Prefer header and updated approach for subscribers.

I have started the PR #20 for this. See "Asynchronous execution" and "Callback".

I have also updated DELETE to allow for 200/202/204.

This was a quick edit and I need to review this more closely, so I have made it a draft PR for now. Comments are welcome.

cc @pvretano

cportele commented 3 years ago

@gfenoy - Thanks for your detailed response, much appreciated. I will take a closer look at the various comments. For now a few comments on selected topics:

Reducing the content of the execute request body to a minimum should be our target.

+1

I am very supportive for your proposal to use the Prefer header, I had mentioned this RFC 7240 here and back in that time the ZOO-Project was supporting this header.

I wasn't aware of that, a pity that your proposal wasn't accepted. We discussed this in the Routes API SWG today and were supportive of the change and I have started to work on a PR for the Routes document and @pvretano will work on a PR for Processes although we were unclear, if that would be an option for Processes at this stage. In general, using the Prefer header and allowing the server to decide seems to be the right approach for servers that may responds synchronously or asynchronously.

As I was referring back in the time about the same RFC again here I would have preferred using the Prefer header which can take the following value: return=representation that can be used as an equivalent for "reponse": "document"; return=minimal that can be used as an equivalent for "reponse": "raw".

Yes, that should work and I agree that it would be good to get rid of mode and response in the execution request content.

An async response from a Routes endpoint will be surprising for a Processes client, as the referenced resource is a Route and not a Job Status.

I am not sure of what you mean by "An async response" here? Usually, when running async execute requests there is no content in the server answer and a 201 (Created) status code and a Location header is provided to redirect to statusInfo in case of success.

I was referring to the resource that is referenced by the Location URI.

The subscriber approach is similar, but different. It should be possible to harmonize this, but that requires an active cross-SWG discussion. In general, I would say that the subscriber information should not be part of the content, but it should be separate, so that we can reuse this more easily. One option could be to include links to subscriber URIs in the request (header) with new link relation types.

One more time, to me everything that can be bring out content from the request body would benefit the OGC API - Processes, by lowering the complexity to produce a request body. I would be very happy to take part of the discussion.

That change is also included in the draft PR to Routes. I still need to add an example to make that easier to understand though. Any comments are welcome.

GET /routes/{id} returns a GeoJSON Feature Collection (which is essential for the Routes API), but as far I can see, GET /jobs/{id}/result cannot return a GeoJSON Feature Collection, because there is always a mandatory wrapper.

One more time, I would use RFC 7240 and based on the Prefer header you can then be able to get the mandatory wrapper or only the result (by using the return=minimal option). In consequence, /routes/{id} from Routes API would become equivalent to /processes/{processId}/result with Prefer: return=minimal from OGC API - Processes. In case we want the wrapper, we may then not pass any Prefer header or ask to set it to return= representation.

I would welcome this.

Just a quick note that with the current OGC API - Processes, you can get the raw representation by using the "response": "raw"even for async execute (i.e. a simple string curl "http://tb17.geolabs.fr:8081/ogc-api/jobs/eb1a1f9a-9c6b-11eb-86b4-0242ac170006/results" -v -L, it may be a json file also, so would work with a GeoJSON FeatureCollection too).

I did not see how this would be possible in the current draft 7.11.2 which requires that a GeoJSON document could only be returned as a value of a property in the JSON result object. Yes, the result can be a JSON file, but it cannot be a GeoJSON file, because that does not fit the required schema

additionalProperties:
  oneOf:
    - $ref: "inlineOrRefData.yaml"
    - type: array
      items:
        oneOf:
          - $ref: "inlineOrRefData.yaml"

Routes supports the resultSet parameter, Processes does not support this and would always return the complete representation.

It reminds me that at some point in time we were discussing the possibility of filtering multiple results from a result set, for services producing more than one output, then you may access an individual result using the following path /jobs/{jobId}/result/{inputId}. I don't know if it would work for the Routes API.

resultSet is a different capability, it is more about reducing the content of the response in certain situations. I was actually considering to propose the use or return=minimal and return=representation for this, but I still have some open questions that I want to investigate first.

GET /routes/{id}/definition returns the Route Definition, Processes does not have a mechanism to retrieve the inputs of a job.

In WPS 1.0.0, the lineage parameter attached to the ResponseDocument node was available exactly for this purpose. So, adding a path /jobs/{jobId}/definition looks perfectly reasonable. I still have one question about bringing back lineage to life, should we return the corresponding json schema describing the input in addition to the "value" or only the input "value" (I use quotation mark around the value word because it can be string, integer, boolean, double, array or object). In case we provide only the values, does it mean that the server simply return the original request Body?

In routes, yes, so far it is only the original request body.

DELETE /routes/{id} returns a 204, but DELETE /jobs/{id} returns a 200.

In OGC API - Processes, when you use the DELETE method on /jobs/{jobId} the server is returning a statusInfo informing the client application that the deletion was successful, this is the reason why it return 200 status code rather than 204 which sounds reasonable.

In the PR I have allowed 200/202/204. For Routes 204 makes most sense as there is nothing that really could be returned that is useful to a client, but there is no need to require this.

opengeospatial / ogcapi-routes

'Processes' Section Options #15