opengeospatial / ogcapi-processes

https://ogcapi.ogc.org/processes
Other
45 stars 45 forks source link

Central job submission endpoint (also: `POST /processes/{processID}/execution` is not RESTful and is confusing for workflows) #419

Open m-mohr opened 2 weeks ago

m-mohr commented 2 weeks ago

The standard says:

The standard specifies a processing interface to communicate over a RESTful protocol

In REST, everything should be centered around resources. The endpoint POST /processes/{processID}/execution is not a resource and POSTing to it, should create e.g. /processes/{processID}/execution/{executionId}, but it doesn't. Instead it may for example create /jobs/{jobId} (in async) (or return a result directly in sync).

To create a job, asynchronous processing requests should be sent to POST /jobs. This would also remove the issue that there is this weirdness that for workflows you need to pick a "main process" to work underneath.

Singular processes could also be sent there with a workflow that just consists of a single processing node. if you just send async requests to the endpoint issues with the Prefer header would also be solved: #413

Synchronous processing for a single process could still be sent to POST /processes/{processID}/execution but it would be weird for a workflow to be sent to that endpoint, too. So maybe it should be a separate top-level endpoint?

PS: This came up in some discussions with @fmigneault and @aljacob recently so posting it here for discussion.

fmigneault commented 2 weeks ago

I will repeat my answer during the Testbed's meeting just for the sake of sharing with everyone openly.

POST /processes/{processID}/execution was introduced (to my understanding), because an OAP implementation is allowed to omit the creation of a job. This is relevant, notably, if the OAP decides to only support sync execution mode, where a job resource is not necessary (though it could still create one for later reference if desired), since the results are obtained directly.

Given that no job would be created in this case (which is technically considered the default/minimal requirement of OAP), the inverse non-RESTful argument arises if POST /processes/{processID}/jobs was used, since no job entity is created and 200 is returned. The way to avoid this ambiguity in REST is usually to replace the term by an action/verb, hence the execution (arguably, a better choice could have been execute?), to indicate that an operation is "created" rather than a resource.

Note that I agree with /processes/{processID}/jobs being better, since my implementation supports both sync/async, and creates a job for reference in both cases anyway, but I understand the reasoning of the execution counterpart. Since it is not much overhead, and for the sake of backward compatibility, my server handles both paths interchangeably.

I think POST /jobs makes sense as well (especially for alignment with openEO and potentially submitting an ad-hoc Workflow). It makes sense to add a POST definition for this path since it is already available, and would (in the case of async at least) deal with corresponding resources. However, I think this does not resolve the RESTful convention issue in the case of sync that would still not require a job resource to be created.

I think sync/async and job creation implies opposite ways to think about it, and none will be satisfied with either approach. My preference is to reduce the number of endpoints doing the same thing, even if it might feel odd for one approach over the other. That being said, I would most probably support either way regardless for historical/compatibility reasons.

jerstlouis commented 2 weeks ago

If a particular "root" process really does not make sense for some workflow definition (although there are work arounds for that, like a simple process gathering outputs from multiple processes, whether as separate outputs, as an array of outputs, or as an actual combining operation like merging multiple bands into a single GeoTIFF), then we could probably agree on some other end-point where to execute a workflow in an ad-hoc manner. For pre-deployment of workflow, Part 2 should still be used (POST a new workflow definition at /processes to create a new process).

Whether using /jobs for this purpose makes it easier or harder for openEO integration probably depends on #420 discussion in terms of whether it conflicts with existing capabilities or ends up working exactly the same as current functionality.

gfenoy commented 31 minutes ago

During the SWG meeting on 2024-07-08, I introduced the idea of defining a conformance class in the OGC API - Processes - Part 3: Workflows to add POST on /jobs with an execute request that would define a "workflow." When I say "workflow" here, I mean a JSON object that would conform to the execute-workflows.yaml schema, so a processes chain (execute request with a root process).

With Part 1, it stays the same:

POST /processes/{processId}/execution execute request conform to execute.yaml

The response contains a JSON object conforming to statusInfo.yaml and a header with the location of the created job (/job/<jobid>).

With Part 3, you would be also able to use the following:

POST /jobs/ execute request conform to execute.yaml (adding a "process" attribute pointing to the process to execute)

Here, there are options for what happens.

  1. The behavior can be the same as with the execute endpoint (POST /processes/{processId}/execution), and the execution starts right away. There is no real interest in adding such an end-point if it offers no capability other than the one defined in Part 1.
  2. Another option would be to return a JSON conforming to statusInfo.yaml containing a new jobid and status=accepted. A Location header can also be included to point to the created /job/{jobId}. But no execution occurs at that time; you only ask for a job instantiation.

Using the second option, we may then imagine using POST on /jobs/{jobId}/execution to start the execution of the "prepared job" effectively (I was willing to use the same /jobs/{jobId}/results initially. Still, it conflicts with the currently available end-point). Then, the behavior remains the same as for a standard execution.

I think adding this modification in the Part 3 draft specification would help align OGC API - Processes with OpenEO.

If there is interest in adding this to the Part 3 draft, I volunteer to start the writing and work on PR for this addition for further discussion.