Open m-mohr opened 1 month ago
The idea was initially to return the definition used to instantiate the job ("status": "created"
) with additional metadata to describe the mode used (sync/async), for example. Still, I think it would be better to store it in the process
attribute (see #450), which would then be added to the statusInfo.yaml
schema, and we can remove this endpoint.
What do you think?
I agree. Default should be the process
property. We can define an optional link relation type for processes that can't be represented in JSON, but without a specific endpoint.
Because jobs are submitted using a process: URI
property, I wouldn't be surprised that some servers were already embedding that process URI in the job status response. Therefore, the property added to statusInfo.yaml
should not only consider the embedded JSON representation, but the reference directly as well (using oneOf
).
OAP jobs might be, openEO no. There it's process: object
. And what is if you submit MOAW or CWL? That's not necessarily URI either, right?
Exactly. This is why a separate endpoint (or the same using another Accept header) is proposed. This way, we don't need to depend on a specific embedding of process
within the job (though I'm not against having it embedded if the server provides it). Just need to allow the flexibility of what process
contains.
Yeah, that's what I proposed. That doesn't necessarily need a separate pre-defined endpoint though. Not sure whether we agree or disagree right now 😅
If process: "https://.../processes/{processId}"
can be returned in the job status, and that endpoint supports Accept
header to negotiate any application/cwl+json
, application/ogcapppkg+json
, etc. handled by the server, then yes, GET /jobs/{jobId}/definition
is redundant.
If the job contains a non-deployed workflow (such as OAP Part 3 Nested Processes), then there is no such thing as a reference "https://.../processes/{processId}"
, since the workflow is an ad-hoc definition.
In that case, it MUST be embedded in the job status as process: {ad-hoc workflow}
.
So yeah, we agree if the JSON schema allows both variants.
I had the following in mind (only illustrated for OGC API - Processes).
For the sample creation request below:
POST /jobs
--
Headers
--
Prefer: respond-async;return=representation
Content-Type: application/json
Content-Schema: https://raw.githubusercontent.com/opengeospatial/ogcapi-processes/refs/heads/master/openapi/schemas/processes-workflows/execute-workflows.yaml
--
{
"process": "/processes/AA/execution",
"inputs": {
"stac_items": [
"https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2B_10TFK_20210713_0_L2A",
"https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/S2A_10TFK_20220524_0_L2A"
],
"aoi": "-121.399,39.834,-120.74,40.472",
"epsg": "EPSG:4326",
"bands": [
"green",
"nir"
]
}
}
The expected response would look like:
{
"id": "af419f90-97ab-11ef-81ac-0e6063d70ef5",
"type": "process",
"processID": "AA",
"created": "2024-10-31T17:15:10.912Z",
"status": "created",
"message": "ZOO-Kernel created your job",
"process": {
"preferences": "respond-async;return=representation",
"schema": "https://raw.githubusercontent.com/opengeospatial/ogcapi-processes/refs/heads/master/openapi/schemas/processes-workflows/execute-workflows.yaml",
"process": "/processes/AA/execution",
"inputs": {
"stac_items": [
"https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/XX",
"https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/YY"
],
"aoi": "-121.399,39.834,-120.74,40.472",
"epsg": "EPSG:4326",
"bands": [
"green",
"nir"
]
}
},
"links": [
{
"title": "Execute endpoint",
"rel": "http://www.opengis.net/def/rel/ogc/1.0/execute",
"type": "application/json",
"href": "https://server/ogc-api/jobs/af419f90-97ab-11ef-81ac-0e6063d70ef5/results"
},
{
"title": "Job Management endpoints",
"rel": "http://www.opengis.net/def/rel/ogc/4.0/job-management",
"type": "application/json",
"href": "https://server/ogc-api/jobs/af419f90-97ab-11ef-81ac-0e6063d70ef5"
}
]
}
We can remove the /jobs/{jobId}/definition
as we have the definition in the process
object.
Also, rather than adding the header properties directly in the process
object, it would probably be easier to use a dedicated headers
object added to the process
object. This object can then contain whatever sounds relevant to send to an execute endpoint to get it to behave as expected.
"process": {
"header": {
"Content-Type": "application/json",
"Content-Schema": "https://raw.githubusercontent.com/opengeospatial/ogcapi-processes/refs/heads/master/openapi/schemas/processes-workflows/execute-workflows.yaml",
"Prefer": "respond-async;return=representation"
}
"process": "/processes/AA/execution",
"inputs": {
"stac_items": [
"https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/XX",
"https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/YY"
],
"aoi": "-121.399,39.834,-120.74,40.472",
"epsg": "EPSG:4326",
"bands": [
"green",
"nir"
]
}
},
It looks very similar to what we used in "Table 43 — Parts of InputReference data structure" of WPS 1.0.0 (OGC 05-007r7). It is mentioned the following definition for the <Header>
node:
Extra HTTP request headers needed by the service identified in ../Reference/@href. For example, an HTTP SOAP request requires a SOAPAction header. This permits the creation of a complete and valid POST request.
At that time, we also used the <Body>
Node for embedding the request body.
Re-using the <Body>
node concept would look like this:
"process": {
"header": {
"Content-Type": "application/json",
"Content-Schema": "https://raw.githubusercontent.com/opengeospatial/ogcapi-processes/refs/heads/master/openapi/schemas/processes-workflows/execute-workflows.yaml",
"Prefer": "respond-async;return=representation"
}
"body": {
"process": "/processes/AA/execution",
"inputs": {
"stac_items": [
"https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/XX",
"https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs/items/YY"
],
"aoi": "-121.399,39.834,-120.74,40.472",
"epsg": "EPSG:4326",
"bands": [
"green",
"nir"
]
}
}
},
This <Header>
node was lost in WPS 2.0.
I wouldn't include the schema
, preferences
, etc. headers under process
. This is what the /jobs/{jobId}/inputs
should return under headers
(https://github.com/opengeospatial/ogcapi-processes/blob/master/openapi/schemas/processes-job-management/inputs.yaml) [note: headers
what not added to this schema, but it was suggested previously].
This example also illustrates why process
is somewhat ambiguous. In this case, it is not really a "process" (description) per say, but a job-execution content. I believe the process
should point to an actual process (i.e.: the nested process
URI in that case). Using a field named process
that could contain a mix of process-description or job-execution makes it very hard to interpret.
Embedding a job-execution content under process
is also very ugly if the contents are not JSON.
I don't really understand the issue here. My proposal was:
We have an optional "process" property of type object in the job description object and whenever it's a json object you can embed it.
It's optional though and if the process is NOT an object, you just add a link to the job description object, like:
"links": [
{
"title": "Process definition",
"rel": "process-definition", # just an example, we can change that to another rel type
"type": "application/yaml",
"href": "./my-process.cwl" # could be at /jobs/:id/definition, but also somewhere else in principle, e.g. /processes/:id or so - We don't need to pre-define an endpoint, just follow the link
},
...
I think that should be able to capture all cases. I haven't seen any reason yet why this wouldn't work. Any thoughts?
The link is fine.
I'm not really fond of process
containing something that is not an OGC API - Process description or a URI pointing to one. It is very confusing when the same word refers to different kind of contents within the same API. Even in the context of openEO, shouldn't it be a process_graph
or similar, and not just "process
"?
No, process graphs alone are worthless, the process has additional metadata that may be needed in addition to the graph. We can't prevent that the process will be something different.
So, if I understand correctly, openEO's definition is something along the lines of "process-graph + configs = process" ? In OAP, we have "process + inputs/headers -> job definition".
If my interpretation is correct, I can understand openEO's use of process
, but this is an important clash in terminology for OGC API - Processes. If we use process
to describe something that is not a "process" reference in the typical way it is used to submit jobs, we create confusion in the standard and understanding of the responses.
No, it's more the execution graph (which includes inputs/input references) and the process metadata. They are one unit, similar to CWL, I think. The thing is, in openEO processes that a server define and processes that a user define share the same schema and as such are both processes. Processes are pretty self-contained, i.e. there are no separate inputs/headers although there might be other related entities such as jobs, which have additional "config" such as title, plan and environment config (e.g. memory, cpu). Not 100% sure what you mean by config.
Example: Server provides (pre-defined) processes add, divide, multiply and subtract. A users chains that to a custom process that's called NDVI and submit's it as (user-defined) process for execution. API docs: https://openeo.org/documentation/1.0/developers/api/reference.html#section/Processes Example from the Python perspective: https://open-eo.github.io/openeo-python-client/udp.html
The thing is, in openEO processes that a server define and processes that a user define share the same schema and as such are both processes.
That's good. It is the same in OAP.
Not 100% sure what you mean by config.
I meant exactly what you mentioned, such as job title/plan/environment that slightly affects the process. The process itself is mostly agnostic to this "config", but could be affected by them (eg: number of CPUs affected will impact processing speed, or maybe parallelization).
All in all, to my understanding, OAP and openEO both have similar behavior. Some form of "execution graph" is populated by actual inputs references (submitted by the user) and relevant processes. Therefore, this is exactly why I feel job using a process
field containing that information is misleading (IF it contains the job input values), since this is not a "process" per se (neither in OAP nor openEO), but the entire "execution graph" that employs specific inputs and one-or-more processes, whatever those processes embed (server-defined, CWL, a docker, etc.).
CWL does NOT correspond to that "execution graph" either. It is at workflow definition (how the inputs/output should be chained), but the effective inputs submitted with the job input values are not yet specified at that point. Therefore, embedding the CWL in process
WOULD be a process representation. If the "user-defined" openEO process corresponds to this as well (without job input values), then we agree on the process
contents. This is not respected in the case of Part 3 Nested Processes that do include the job input values.
Somewhat, but it's not quite as in OAP. In openEO the input values are part of the process execution graph, there are no separate input values which you could submit. And if your process has parameters, you need to encode them in another process where the inputs again are part of the process execution graph. ;-) Might a bit confusing for you without a concrete example, I guess?
The config is not part of that. That's part of the job, e.g.
Job:
Process (this is one atomic unit and shall never be split into pieces):
If you define something like the following, you can only store it as UDP, not execute it as job.
Process (this is one atomic unit and shall never be split into pieces):
To execute it, you'd again have somethine like
(Disclaimer: Simplified example)
These descriptions are clear.
What I'm still not sure is inside the /jobs/{jobId}
response, which one of these is going to be contained in process
?
If it is similar to:
{
"id": "{jobId}",
"status": "running",
"process": {
"my_other_process": {
"process_id": "my_process",
"arguments": { "X": -5 },
"result": true
}
}
}
This is what I find "misleading", since process
would contain an execution "graph" (combining my_process
with its specific X=-5
input). So, why not simply call the field graph
and avoid the confusion with the overused term process
?
If, instead, the job status response process
contains only the UDP my_process
description with its absolute(add(X, 2)
definition, parameters: X
, etc., then I have no issue with using process
. It is equivalent to having a deployed process that would contain that definition, and referring to it by URI.
It would contain the absolute(add(5, 2)
usually. parameters in openEO just contains the schemas, the values are part of the graph. Always fully resolved for the job.
Example from the API spec: https://api.openeo.org/#tag/Data-Processing/operation/describe-job
Might be a good idea to hop on a short call to clarify this in all details, I feel like in text it's much more difficult to get to the details comparet to going through some example in a screenshare...
_Originally posted by @m-mohr in https://github.com/opengeospatial/ogcapi-processes/pull/437#discussion_r1789875331_