Recommendation 12: Minor typo

opengeospatial / ogcapi-processes

https://ogcapi.ogc.org/processes

Other

46 stars 45 forks source link

Recommendation 12: Minor typo #311

Open ghobona opened 1 year ago

ghobona commented 1 year ago

"response asynchronously" should be "respond asynchronously"

Screenshot 2023-01-31 at 16 51 56

ghobona commented 1 year ago

Another issue with Recommendation 12 is that 12B appears to recommend the same thing as 12A but with a different Prefer header value. I think this was intended to say that if the Prefer header has a value of wait, then the process should be executed synchronously.

Thanks to @pcampanella for catching one.

jerstlouis commented 1 year ago

@ghobona It's a bit more complicated. As soon as the Prefer: header is used, for processes that declare async execution, the process should be executed asynchronously. There is no preference defined to specify that the process be executed synchronously. Instead, a client does Prefer: respond-async,wait=5 meaning that it's ready to wait up to 5 seconds for a synchronous response (assuming the process also declares support for sync execution), but otherwise expects an async response (as explained in https://www.rfc-editor.org/rfc/rfc7240#section-4.3).

pcampanella commented 1 year ago

As the recommendation is written now, I understand that there are two cases but the server has to respond always asynch.

If this is confirmed, I assume that the tests should accept always the 201 answer if the processor declares that can be executed only in Asynchronous mode.

Do you agree?

jerstlouis commented 1 year ago

@pcampanella

The first thing to check is the execution modes declared in the process description.

If only sync is declared, the process will always respond to execution with a 200. If only async is declared, the process will always respond to execution with a 201.

If both sync and async are declared, and the client did not use a Prefer: header, that means that the client does not support 'async' execution, and the server will also always respond with a 200.

If both sync and async are declared, and the client used a Prefer: header, then the client is async-aware, and the server can decide whether to return a 200 or a 201. It should take into consideration the Prefer header value as well as the amount of time the execution is expected to take in making that decision.

fmigneault commented 1 year ago

I don't quite agree with the statement:

If both sync and async are declared, and the client did not use a Prefer: header, that means that the client does not support 'async' execution, and the server will also always respond with a 200.

That simply means "don't care" from the client. It gives no guarantee the server will respond with 200.

Regardless of (un)specified preferences from the client, the server can always decide to ignore it (preferences are not requirements) and do what it deems more appropriate based on available resources and estimated time for processing that request. That means the server could decide to respond with 201 (since the client indicated "don't care") and it's up to the client to update its methodology to be async-aware, as both sync/async were advertised as possible response methods in the process description. The server does not need to adapt its behavior to the client if it is lacking capabilities.

The only guarantee you can have is based on 200 vs 201 response. If the client wants to work only in sync mode, it can specify Prefer: wait=<some huge value> to hint that it wants to wait indefinitely for a sync response. But still, that is not a guaranteed by the server, and it can cut the HTTP connection at any time.

jerstlouis commented 1 year ago

@fmigneault

That simply means "don't care" from the client. It gives no guarantee the server will respond with 200.

The Standard does guarantee this. See Requirement 25 C (7.11.2.3. Execution mode):

/req/core/process-execute-default-execution-mode

Conditions: The execute request is not accompanied with the HTTP Prefer header.

A) The server SHALL respond asynchronously if, according to the job control options in the process description, the process can only be executed asynchronously.
B) The server SHALL respond synchronously if, according to the job control options in the process description, the process can only be executed synchronously.
C) The server SHALL respond synchronously if, according to the job control options in the process description, the process can be executed in either mode.

fmigneault commented 1 year ago

In that case, OGC Standard REQ25C should be modified as it wrongly uses a standard HTTP header in a way that goes against its intended use.

https://datatracker.ietf.org/doc/html/rfc7240#section-2 (emphasis mine)

The Prefer request header field is used to indicate that particular server behaviors are preferred by the client but are not required for successful completion of the request. Prefer is similar in nature to the Expect header field defined by Section 6.1.2 of [RFC7231] with the exception that servers are allowed to ignore stated preferences.

https://datatracker.ietf.org/doc/html/rfc7240#section-6

A server could incur greater costs in attempting to comply with a particular preference (for instance, the cost of providing a representation in a response that would not ordinarily contain one; or the commitment of resources necessary to track state for an asynchronous response). Unconditional compliance from a server could allow the use of preferences for denial of service. A server can ignore an expressed preference to avoid expending resources that it does not wish to commit.

If OGC wants to make use of standard HTTP headers, it cannot apply them halfway. It either follows the RFC or not at all.

To support explicitly requesting sync preference, a new keyword should be added (e.g.: respond-sync), as also proposed by https://datatracker.ietf.org/doc/html/rfc7240#section-2

This header field is defined with an extensible syntax to allow for future values included in the Registry of Preferences (Section 5.1).

https://datatracker.ietf.org/doc/html/rfc7240#section-4

The following subsections define an initial set of preferences. Additional preferences can be registered for convenience and/or to promote reuse by other applications. This specification establishes an IANA registry of preferences (see Section 5.1).

The current wait, respond-asyc and return values are only the initial ones provided during the creation of that RFC, but are extendable for cases such as this one, and should be handled in this manner. Missing Prefer should not always be considered as equivalent to an explicit wait or another keyword.

jerstlouis commented 1 year ago

@fmigneault This was the agreed SWG consensus after a lengthy debate on this topic.

This Req 25C applies when the Prefer: header is NOT used, so in a sense it could be argued that it explicitly does not involve RFC 7240 (I understand that this is a bit of a stretch, since the implementation is not ignoring its presence).

Adding a respond-sync preference needs to be done with the registry, and would still allow a server to ignore the request and return an async response, which would not support the interoperability use case of supporting simple synchronous clients that do not need to perform polling / results fetch etc.

From the practical experience of implementing a client supporting async processing requests and TIEs with several different implementations in multiple testbeds and other initiatives, the level of complexity and possible things that may go wrong in async processing server implementations is at a whole other level.

Another related consideration was compatibility of this sync/async mechanisms with other OGC API Standards which currently all respond syncrhonously by default, but may wish to support the same async/jobs mechanism as Processes using the same /jobs/{jobId} mechanisms (e.g., /coverage requests).

fmigneault commented 1 year ago

@jerstlouis I'm not sure which processing TIEs have been evaluated during other initiatives, but asynchronous cases have definitely been tested during other testbeds. I have myself participated in many of them, and I have had as many issues (but of different nature) with both async/sync implementations. For any operation that takes more than a few seconds, HTTP servers close the request connection, hence why asynchronous executions are necessary. It is not possible nor practical to leave a connection open for long periods of time hoping the server responds to it. It also greatly limits the amount of users/clients that servers can respond to.

There are also an increasing amount of tracks that explore Analysis-Ready Data and Machine Learning / AI processing that are too lengthy to run in synchronous mode. While it is important to support sync and compatibility for operations like /coverage, that does not make it the only use case and we should not disregard other valid use cases where sync is simply not applicable.

I think that both sync and async mode have their role in this specification in order to support the multiple use cases. However, I do not think that automatically defaulting to sync is logical in all cases. If some implementations strictly require running in sync mode due to their nature, I don't see why it would be harder for those to specify Prefer: wait=100s than other application that expect async to provide Prefer: respond-async. In both cases, clients should not assume the response would be in their preferred method, and should fail if they cannot handle the response that does not match their expectation. That is not different from handling a sudden 4xx/5xx response.

jerstlouis commented 1 year ago

@fmigneault

It is not possible nor practical to leave a connection open for long periods of time hoping the server responds to it.

I understand that, and this is why the asynchronous execution mode exists and implementations can decide to support it for particular processes.

EDIT: It might not be practical, but it is likely possible. When you say:

For any operation that takes more than a few seconds, HTTP servers close the request connection, hence why asynchronous executions are necessary.

I understand it to be the choice of the Processes implementation and/or any proxy set up in front of it and/or the client to close the request connection, but AFAIK there is no inherent timeout in the HTTP protocol itself (but I may be wrong). This S/O post suggests that the timeout should be a function of the expected complexity / amount of data to be processed.

But some processes / implementations may be able to return a response quickly to all possible requests and/or simply refuse to execute a request taking longer than a few seconds. And this is why the synchronous execution mode exists and implementations can decide to support it instead of, or in addition to async.

we should not disregard other valid use cases where sync is simply not applicable.

Of course, we should not disregard other valid use cases.

Note that the Processes - Part 3: Collection Output conformance class is an alternative approach to async execution to the problem focused mainly on retrieving vector or raster output for a given Area and Resolution of Interest, which can constrain the amount of data to process to a deterministic upper bound (e.g., together with OGC API - Tiles or OGC API - DGGS for a single tile or zone at a resolution inversely proportional to its spatial extent).

In both cases, clients should not assume the response would be in their preferred method, and should fail if they cannot handle the response that does not match their expectation.

That would have been one option. However, in OGC API data access requests like /items and /coverage, clients do assume that the response will come back synchronously (without providing a Prefer: wait=100), and the current Req 25 C allows to reconcile this with the /jobs async approach.

fmigneault commented 1 year ago

For the sake of /items and /coverage support assuming synchronous calls without providing Prefer, I can understand why sync was initially considered the default to make things easier. This makes sense for situations where either only sync or both sync/async are supported for the underlying process that /items and /coverage would call.

However, if that process happened to be async-only, omitting the Prefer header in that case would suddenly make it switch to async by default according to REC25A. The fact that the default execution mode can switch around based on the process description makes the omission of Prefer header very inconsistent. Furthermore, if this process description was modified at a later time to now allow sync mode on top of async, a previously well-defined and working request for a client supporting async would suddenly start breaking because of the omitted Prefer that would switch by default to sync.

This is why IMHO, omitting Prefer feels more like a "don't care" interpretation, and aligns better with RFC7240. Clients should always expect and support both sync and async, and unless they really don't care about a preferred response method, should always specify Prefer in a way that hints toward which async/sync is better.

jerstlouis commented 1 year ago

However, if that process happened to be async-only, omitting the Prefer header in that case would suddenly make it switch to async by default according to REC25A.

The OGC API - Features / Coverages end-points require synchronous support in the Core requirements class, so that does not apply to these.

Furthermore, if this process description was modified at a later time to now allow sync mode on top of async, a previously well-defined and working request for a client supporting async would suddenly start breaking because of the omitted Prefer that would switch by default to sync

This is why clients wishing to receive an async response should always include the Prefer header from the start regardless of the process description, but they should still always be prepared to handle either a sync or async response.