Open fmigneault opened 2 months ago
Thanks for looking into this @fmigneault .
In general, I think your understanding how this is proposed to work is correct.
I was a bit confused at first when you talked about 2 different variants, but yes the intent was that:
application/ogc-workflow+json
) is directly the payload of the deployment POST request,application/ogcapppkg+json
) payload.I thought we already had an id
in there for deployment. An alternative would be to support a PUT to /processes/{processId}
to create the resource (in addition to replacing), or possibly request headers for additional metadata.
Other required parameter version from Part 1 needs to be added.
This is required in the process description. Potentially if it is not there the server could automatically version it... But yes, would make sense to be able to specify a version there directly.
This variant doesn't indicate how additional metadata
Which additional metadata are we missing at the input / output level?
And indicate that schema should be inferred by $input in this deployment use case.
Is that because schema
is required? In this case, if that's allowed, I would suggest to use null
. I would prefer that over the $ref
.
In this case, contrary to previous (1), schema under processDescription.inputs and processDescription.outputs could become mandatory.
Only the derived fields / field selector modifiers (properties=
) would modify the schema, and it is very clear what the resulting fields are in this case: a subset of the fields a returned or new fields are computed from the existing ones. Since you have the schema of all the existing fields, you can also easily infer the results of applying operation(s) on them (assuming of course that the processing engine understands and parses the CQL2 expressions, even if it does pass them on to the remote server).
The requirement mentions using application/ogcapppkg+json, but this can easily be confused with the case where processes-dru/executionUnit.yaml is employed directly. When an embedded execution unit definition is used, it is preferable to employ the qualified value with application/ogc-workflow+json to avoid ambiguity about the package contents (or use Variant 1 directly instead).
I am confused. The mention about using application/ogcapppkg+json
is using DRU with the "OGC Application Package", where the content of the execution unit is the application/ogc-workflow+json
workflow.
My understanding of application/ogcapppkg+json
is that it is agnostic of the execution unit content -- not limited to CWL or anything in particular. What am I missing?
No objection to using application/ogc-workflow+json
, but these are the media types currently suggested in the spec:
application/ogcmoaw+json
— Modular OGC API Workflow JSON,application/ogcexec+json
,application/ogcexecreq+json
The exec req suggests we could align this with the media type for POST to /execution
.
For what it's worth ... I prefer proposal 2 since it makes a workflow just another execution unit. Not special or different from any other execution unit.
I am not sure I understand all the contortions about wf-input
and wf-output
but it seems that there is a desire to reuse some parts of the process description ... specifically the metadata portions ... and the fact that schema
is mandatory gums that up because then you have to "duplicate" the schema already expressed in the execution unit.
To that I would say that we make "schema" optional in the process description. If the schema of the inputs/outputs can be inferred from the execution unit then you don't need to include a schema
for the input/output in the process description. If the schema of the input/output cannot be inferred from the execution unit then schema
is mandatory. That way, an application package can be created that includes an execution unit (like CWL) but also include additional metadata annotations via the OGC Process Description.
@jerstlouis
An alternative would be to support a PUT to /processes/{processId} to create the resource
That wouldn't work because the processId
would not be defined for the subsequent PUT. The process ID must be available from the get-go during the POST deploy request. The important thing to highlight from the examples is that DeployProcess
cannot be the same as "process": "https://example.com/proceses/MainProcess"
. The MainProcess
must already exist in order to extract its I/O schema
definitions, which are used to resolve the types referenced by the wf-input
and wf-output
.
Which additional metadata are we missing at the input / output level?
In case the MainProcess
did not provide any title
, description
, keywords
or metadata
, the corresponding wf-input
and wf-output
deployment could want to provide them. It could also want to override their content to make them more relevant/detailed in the context of the new workflow, which might not expose all the I/O offered by MainProcess
, or any of other nested processes.
Is that because
schema
is required?
Yes, that's the reason, ie: https://github.com/opengeospatial/ogcapi-processes/blob/b972e74d8a09b36c1fc54869b9bfe7f44d1fd20f/openapi/schemas/processes-core/inputDescription.yaml#L4-L5 https://github.com/opengeospatial/ogcapi-processes/blob/b972e74d8a09b36c1fc54869b9bfe7f44d1fd20f/openapi/schemas/processes-core/outputDescription.yaml#L4-L5
I would suggest to use
null
.
If this is the preference over explicit $ref
, then I suggest the other proposal that uses {}
. Using {}
does not require any modification to the schema of schema
, since it already allows an object without any property:
https://github.com/opengeospatial/ogcapi-processes/blob/b972e74d8a09b36c1fc54869b9bfe7f44d1fd20f/openapi/schemas/processes-core/schema.yaml#L3-L4
Only the derived fields / field selector modifiers (properties=) would modify the schema, and it is very clear what the resulting fields are in this case
Yes (field selectors), and no (not clear/easy).
For example, if the NestedProcess
only defined that schema
was the generic GeoJSON (any type
), and the field modifiers did some CQL2 filtering of only type: Point
while adding a new custom
property, the resulting wf-output
could be an entirely other (and more specific/narrowed) schema
reference and format
defined by the user that has [type, custom]
requirements for a point. The custom
field could even come from different parts of the workflow, making parsing of the field modifiers very complicated.
Since field modifiers can completely redefine the output however they want, it is not that trivial to infer the resulting schema
. I also see this as a great opportunity for users to define "converter workflows" where specific output schema could be provided, and which could be "injected" only by the workflow creator that has the knowledge about the resulting schema they dynamically created.
My understanding of
application/ogcapppkg+json
[...]
Your understanding is correct. The only issue about ONLY using Content-Type: application/ogcapppkg+json
is that it makes whatever is contained in executionUnit
very ambiguous, since they are all JSON with similar/complementary field names (inputs
, outputs
, etc.).
Omitting the qualified value representation with application/ogc-workflow+json
should default to using the processes-dru/executionUnit.yaml in case of ambiguity. That doesn't mean you can't try POST'ing the workflow directly and have it automatically "detect" application/ogc-workflow+json
, but I would rather have the standard define a "Best Practice" to include it explicitly to make resolution consistent across implementations.
application/ogcexec+json
I missed this type when reading the draft. It is acceptable as well.
However, I believe application/ogc-workflow+json
would be more "explicit" about the fact that an OGC Part 3 Workflow is POST'd rather than any other execution body. The $input
and $output
of Deployable Workflows are necessary for this definition to make any sense. For the same reason, it is technically "not exactly" the same as an ordinary execution request, since it cannot be executed by itself (values to fill in the I/O referenced by $input
/$output
would be missing).
@pvretano
I am not sure I understand all the contortions about
wf-input
andwf-output
[...]
The advantage (and main purpose) of $input
and $output
are that they can be placed at any level in the workflow. Therefore, the DeployWorkflow
that would be created could only expose wf-input
and wf-output
as "top-level" I/O in its process description, but those references could be passed down/retrieved to/from very-deeply nested processes, or even be reused at multiple places in the workflow.
Reusing schema
from the referenced $input
/$output
is a bonus to avoid duplicating them for wf-input
/wf-output
, but it is not mandatory. The workflow could define I/O schema
with more explicit conditions, but they must be compatible with the places where they are passed down/retrieved from.
To that I would say that we make "schema" optional in the process description.
That would be a valid alternative, as long as it is only in the context of application/ogc-workflow+json
deployment to avoid the explicit schema: {}
to patch JSON-schema validation. I believe the I/O schema
MUST remain required for process descriptions to make any sense. It's the only field left to indicate what the I/O are.
That way, an application package can be created that includes an execution unit (like CWL) but also include additional metadata annotations via the OGC Process Description.
This is valid as well. This is actually exactly what CRIM's implementation does ;) (see https://pavics-weaver.readthedocs.io/en/latest/package.html#correspondence-between-cwl-and-wps-fields and https://pavics-weaver.readthedocs.io/en/latest/package.html#metadata)
The Part 3: Deployable Workflows proposes an alternate deployment definition based on an execution body, trying to bridge Part 1/2/3. I would like to validate my understanding of it, and propose adjustments to improve alignments (as applicable).
Since there are 2 variants for deployment, the 2 are analyzed separately, but using an equivalent workflow example.
Variant 1: Direct Deployment with Execution Body
Analysis
A new process named
DeployWorkflow
with inputwf-input
and outputwf-output
would be created. Theschema
definition ofwf-input
would be the same as the one ofarg
fromNestedProcess
, whereas theschema
of thewf-output
would be equivalent toout
ofMainProcess
.Proposals
Add an
id
field, which is not present inprocesses-workflows/execute-workflows.yaml
.id
was missing considering it is not required for execution only. However, some process ID is needed to perform the deployment.Alternatively to
id
, reuse?w=<id>
query (https://github.com/opengeospatial/ogcapi-processes/blob/master/openapi/parameters/processes-dru/w-param.yaml)processes-workflows/execute-workflows.yaml
definition without any modification. However, debatable whether it is intuitive or not.Other required parameter
version
from Part 1 needs to be added. Since there is no equivalent query parameter, so it might be better to have the Part 3: Deployable Workflows schema be aoneOf[ process-core/processSummary, processes-workflows/execute-workflows ]
This should be added to OpenAPI path
/processes
.Introduce
application/ogc-workflow+json
(or some equivalent) to distinguish from other deployment structures already supported (CWL, OGC App Pkg, etc.).This variant doesn't indicate how additional metadata for the resolved
wf-input
andwf-output
can be defined. Recommendations to had to the document, either:arg
/out
defined, nothing more, nothing less.$input
/$output
to extend/override whatarg
/out
provide.Variant 2: Embedded Deployment of Execution Body in Execution Unit
Analysis
Proposals
Because
wf-input
/arg
andwf-output
/out
schemas should be aligned to be mapped correctly, redefininginputs
andoutputs
with schemas explicitly inprocessDescription
is redundant. However, this would not be disallowed according toprocesses-core/process.yaml
.Recommendations should be given in the standard document about this case.
More specifically,
processDescription.inputs
andprocessDescription.outputs
could be relevant to provide additional details, such asprocess-core/descriptionType.yaml
metadata properties. However, adding anyinputs
/outputs
there would fail validation if theschema
is omitted, since it is required in their definitions. Because of this, we end up going back to redundantschema
definitions mentioned above.Possible recommendations:
Use
And indicate that
schema
should be inferred by$input
in this deployment use case.Recommend to explicitly reference the schema:
If Part 3: Fields Modifiers are thrown in the mix of Deployable Workflows, notably for the
wf-input
andwf-output
, then theschema
mapping betweenwf-input
/arg
andwf-output
/out
could actually differ entirely.In this case, contrary to previous (1),
schema
underprocessDescription.inputs
andprocessDescription.outputs
could become mandatory. This is because, without any reference schema fromDeployWorkflow
(yet to be deployed), the workflow could be validated if they were omitted, since there would be no indication of the intended source and desired result for field-modifedwf-input
/wf-output
.Improve the description of Part 3: Deployable Workflows regarding media-type. The requirement mentions using
application/ogcapppkg+json
, but this can easily be confused with the case whereprocesses-dru/executionUnit.yaml
is employed directly. When an embedded execution unit definition is used, it is preferable to employ the qualified value withapplication/ogc-workflow+json
to avoid ambiguity about the package contents (or use Variant 1 directly instead).