Closed pvretano closed 1 week ago
@pvretano At the moment, the OGC API Application Package is only one potential language/format for the process definition (OGC Application Package is a separate conformance class).
If we keep it that way, neither of those options (calling it /package
or apppkg media type) seem appropriate.
I feel that the definition and description are different enough that they should be separate resources, especially given the amount of confusion that mixing those up generate.
I would prefer GET /processes/{processId}/definition
or GET /processes/{processId}/executionUnit
.
I would have suggested GET /processes/{processId}/execution
if we were not already planning execution via GET.
@jerstlouis application pkg as defined in PART 2 is NOT MANDATORY. If it suited to your purpose it is recommended. Other language/formats are possible and if another one emerges (e.g. CWL or MOAW) we can add that as another recommendation.
@pvretano Yes exactly, that's what I meant.
I am not able to follow the Processes discussion that is happening, but I saw the issue and think it is strange to POST a representation of a resource of type A (process definition) and when you GET the resource that has been created you get a representation of a different resource type B (process description). This doesn't sounds like HTTP to me. I am probably missing the necessary context, so feel free to ignore...
I'm fine with any of:
/processes/{processId}/package
/processes/{processId}/executionUnit
/processes/{processId}
+ Accept: application/ogcapppkg+json
/processes/{processId}?f=ogcapppkg
I will redirect all of these to whichever gets decided. If possible, I prefer to have path components over headers/queries.
It is important to ensure Accept
/f
are required if /processes/{processId}
is used for POST/PUT.
@cportele
It is mostly to preserve backward compatibility with Core. Deployed process or builtin are all obtained through GET /processes/{processId}
.
Any extra information needed to define the process can be POSTed as well, but not mandatory.
It doesn't make sense to POST to another location, because the executionUnit
has no reason to exist if not nested under the process it defines.
Of those @fmigneault is happy with, I only like GET /processes/{processId}/executionUnit
because deployment is not limited to application packages, so calling the definition 'package' or relying on the ogcapppkg media type to distinguish description vs. definition doesn't work for other definitions.
I agree with @cportele about the mismatched GET (description) vs. POST/PUT (definition), but I am not sure how we could solve this.
One possibility might be to only support PUT at /processes/{processId}/definition
to deploy a process? (but that is also odd if /processes/{processId}
does not exist yet)
@fmigneault
Any extra information needed to define the process can be POSTed as well, but not mandatory.
I still argue that the description explaining how a process should be invoked is very different from instructions defining the process :)
Lets step back for a minute.
In my mind the there is no difference between a process description and a process definition! They are simply two different representations of the same resource ... the process. One happens to include more information than the other. So, POST'ing to /processes/{processId} to add a new process seems consistent to me. Its just that a process description does not contain enough information for the server to deploy a working process.
Also, I am starting to wonder whether we actually need a new endpoint from which we get the process definition. If we agree that a process description and a process definition are simply two different representations of the same resource than content negotiation should be sufficient to get one or the other.
From my server you can do a GET /process/{processId}?f=application/ogcapppkg+json
and you will get the application package of the process rather than the description. If you set f=json
you get the description ... which is also the default as per the core.
If we agree that a process description and a process definition are simply two different representations
I disagree, in my mind there is a fundamental difference, just like a C function prototype vs. a C function body.
But a C prototype and a C function represent the same thing ... the function ... and, again, one includes more information about the function that the other.
One other point ... since the very beginning of deploying processes to an OAProc deployment the process definition has been referred to as an "application package" and so I think we need to respect that terminology. So the endpoint name for getting the definition ... if in fact it needs to be a separate endpoint ... should have a name related to that legacy. So, /processes/{processId}/package
or /processes/{processId}/apppkg
or something like that ...
@pvretano Wasn't that specifically in the context of the OGC Application Package though (which is the only example I am aware of implemented with process deployment so far)? One possibility too is to merge the Application Package into Core, leaving the flexibility of how the executionUnit is defined (e.g. CWL, MOAW workflow, OpenEO). But I am worried that this would prevent the ability to upload a binary component directly as the deploy payload, e.g. a Jupyter notebook or docker inside a ZIP, as opposed to linking to something available from a URL.
@pvretano
But a C prototype and a C function represent the same thing ... the function ... and, again, one includes more information about the function that the other.
True... (That is one way of seeing it) But for example, I may want to retrieve:
Each of these would need its own media type if it is a single resource.
I don't see a problem here especially if it means the interface less cluttered.
Some of these, I believe, already have media types defined (CWL application/cwl+json, OpenAPI application/openapi+json).
I don't really see deploying a processes with a CWL execution unit and then asking the server to return that with a MOAW execution unit so as long the application package itself has a place to identify the type of execution unit (which it does) we only need one media type for an application packages ... perhaps parameterized to return the prototype, the body or both. There is a media type defined in Part 2 application/ogcappkg+json
and we can add a parameter named content
to get the proto
, the body
or the proto+body
.
As for MOAW and OpenEO DAG, those should probably have media types defined ... no?
@pvretano As you say the apppkg media type could still only support returning the original or native flavor of the execution unit, while you could use the execution unit media types to return only that part directly in different flavors.
Or the ogcappkg media type could also potentially have a parameter defined for the execution unit format, e.g. application/ogcappkg+json; executionType=application/cwl+json
.
as for MOAW and OpenEO DAG, those should probably have media types defined ... no?
Agreed :) The MOAW media type could also be the same as for a generic execution request (those POSTed to /execution
).
@jerstlouis The form of the media type is open for discussion but do you really expect someone to deploy a process using a CWL execution unit and then asking the server to return "application/moaw+json"? If not then saying executionType=application/cwl+json
is irrelevant ... no?
@pvretano It does seem that for some of the scenarios in #279, some processing engines would be able to translate between MOAW / OpenEO / CWL.
e.g. deploying an apppkg containing MOAW executionUnit and generating CWL to run it.
Yeah, I suppose that if the execution unit of a process is a scripted workflow then a server being able to translate from one to the other might be possible. If, however, the execution unit was a Docker container with a binary executable then that type of translation would be really, really hard if not impossible. Generally, however, I think that a process would be deployed with an execution unit of one type and that would be the type you get when you request the body or execution unit of the process. I would consider translating from one execution unit type to another as being an edge case and something that we could not necessarily document in the standard.
I keep pushing the use of the OPTIONS method without success but OPTIONS /processes
would tell you which media type you can request from the endpoint and if your server supported translating between execution unit types you would see media type for CWL, MOAW, OpenEO DAG or whatever your server supports.
I also think process description and process definition are different entities. The first is like metadata about the process while the second details how to run it. But, even though they represent different portion of the process as a whole, they both don't make any sense if separated in the context of a deployed process. The distinction becomes clearer IMO because there are multiple ways to represent the definition, but the process description should always be the same schema, just like those 2 components are properly represented under 2 separate fields processDescription
and executionUnit
when deploying.
@jerstlouis Please don't consider it is "the only example of implemented with process deployment so far". There are a large group of people that worked on this and based their implementation around this design. We must maintain backward compatibility as much as possible. You would not like it also if Part 3 was fully integrated and then some components where modified or dropped afterwards.
I don't want a major 2.0 just for moving stuff around in a deploy payload. We can work with processDescription
and executionUnit
as it already is defined for "OGC Application Package". That should remain the same. This goes as well for the extra PUT request. It breaks existing implementations and makes deployment even more convoluted then it already is. My implementation is able to patch missing parts in both processDescription
and executionUnit
when it can infer details from the other, so I need both at the same time. If an implementation wants to deploy some different content (eg: Jupyter notebook, zip, bin, ...), it can still do so using POST with a different Content-Type of the desired format.
I like the idea of returning different representations of /processes/{processId}/package
using application/cwl+json
, application/ogcappkg+json
or application/moaw+json
. That could allow interoperability between servers that have at least 1 overlapping representation they both support. My server would at least support the first 2, and potentially the 3rd.
OPTIONS /processes would tell you which media type you can request from the endpoint
That would be nice. Definitely the cleanest approach.
@fmigneault
Please don't consider it is "the only example of implemented with process deployment so far"
I think you misunderstood what I meant by that, but I stand corrected because Peter clarified that the EO Application Package Best Practice actually does NOT use the OGC Application Package -- I thought it did. Rather it is an example of POSTing CWL directly as the Content-Type, if I understand correctly.
We must maintain backward compatibility as much as possible.
I never suggested otherwise :)
I don't want a major 2.0 just for moving stuff around in a deploy payload.
OGC API - Processes - Part 2 1.0 has not even been published yet...
/processes/{processId}/package
I don't feel strongly about this (so it's fine with me if that is the consensus), but it seems to me that the name is not intuitive to refer to the process definition.
I am also already confused with what an "application package" is given that the EO Application Package Best Practice is directly CWL, whereas the OGC Application Package contains a process description + an execution which can be different things (CWL among those).
As I mentioned previously, I would also recommend to define a new Application Package format (application/ogcappkg+zip
?) as a compressed archive that can include a process description, execution unit, and potentially other files like a docker or Jupyter notebook or binary executable that can be directly embedded within the package itself -- that sounds more like an actual (self-contained) "package" to me.
NOTE: I am happy if we clarify that an "application package" is the same as the "process definition", and that "OGC JSON Application Package" (application/ogcapppkg+json
) is one specific format of that. /package
then makes more sense.
It might make sense to also include the EO Application Package (or could that just be CWL Application Package?) and maybe this new OGC Zipped Application Package as other conformance classes of Part 2?
@pvretano @jerstlouis
I am also already confused with what an "application package" is given that the EO Application Package Best Practice is directly CWL, whereas the OGC Application Package contains a process description + an execution which can be different things (CWL among those).
You are not the only one 😅 Best Practices mixes things in its own documentation. It seems to refer to a similar format as "OGC Application Package":
https://docs.ogc.org/bp/20-089r1.html#toc16
The Application Package includes the following information:
- Reference to the executable block that implements the Application functionality
- Description of its input/output interface
i.e.: represented in the executionUnit
and the processDescription
respectively.
Even some sections somewhat contradicts themselves:
https://docs.ogc.org/bp/20-089r1.html#toc17
This best practice focuses on the scenario where an application is directly packaged as an Application Package, registered in a Platform and made available as an implementation of OGC API — Processes. [...] Validate and accept an application deployment request (OGC API — Processes Transaction Extension)
So, deploy directly ~OGC~ "Application Package" (ie: CWL), but validate with Transactions, a.k.a: OGC API - Processes - Part 2: Deploy, Replace, Undeploy before it was renamed.
I think adding application/ogcapppkg+json
and application/cwl+json
(and others of course like the zip) will resolve these ambiguities once and for all.
I really hate to do this, having argued previously for the name /processes/{processId}/package
but I think I now agree with @jerstlouis that the name is maybe less intuitive than it could be. The name /processes/{processId}/definition
might be a better match for the way we currently view the relationship between the description and the definition of a process. Thoughts?
Common Workflow Language (CWL) is a standard for describing computational data-analysis workflows
20-089r1 documents how CWL can be THE encoding of the OGC Application Package and how it can be used in Part 2 when deploying an application
From the CWL document we are able to obtain the process description (input/output interface) and execute the application (whatever might be)
CWL is a well-known format with a large community and open source implementations
ZIP files are archives that store multiple files
@p3dr0
20-089r1 documents how CWL can be THE encoding of the OGC Application Package
This is where things get confusing, because currently Part 2 describes an encoding and conformance class called the "OGC Application Package" (application/ogcapppkg+json
) specifically as a JSON encoding of a particular schema including processDescription and executionUnit properties, and this encoding can support different types of execution units (CWL being one of them), which is a part of the overall OGC Application Package.
From the CWL document we are able to obtain the process description (input/output interface) and execute the application (whatever might be)
Yes it makes sense to be able to infer the process description from the process definition execution unit (e.g. CWL).
ZIP files are archives that store multiple files
Yes, I suggest a ZIP-based application package type, with one or more files with pre-determined names inside whose content follow a particular schema, like several archive-based package formats e.g. .deb
, .cab
etc., because it would allow to deploy fully self-contained packages that do not require external references (all required resources can be embedded within the archive), avoiding the need to e.g. maintain a separate docker repository, or requiring a separate authentication mechanism when referencing private resources.
Would it make sense to refer to the 20-089r1 application packages simply as CWL process definition (or maybe CWL application package) and use the application/cwl+json
Content-Type when deploying them directly via POST to /processes
?
@pvretano
I can live with that modification (/processes/{processId}/definition
) with support to return previously mentioned different content-types.
@mr-c Is there an OpenAPI YAML schema location of the top-level CWL definition we could refer to in order to directly reference it to properly define what is proposed in #258?
@mr-c Is there an OpenAPI YAML schema location of the top-level CWL definition we could refer to in order to directly reference it to properly define what is proposed in #258?
Identifiers for particular versions of the CWL standards have the form https://w3id.org/cwl/v1.2/
https://w3id.org/cwl/v1.2/cwl.ttl is an RDF description of CWL concepts
There is also https://w3id.org/cwl/v1.2/cwl-context.json
An OpenAPI YAML Schema doesn't make sense to me because CWL is a serialization format, not an API. Maybe @tetron or @stain have other thoughts on the topic?
So I would recommend the use of https://w3id.org/cwl/v1.2/
to identify the expected content.
If it would be a better fit, then we can ask for a CWL identifier in the IANA media types database (previously know as MIME types).
There's a data definition part of openapi which I believe is quite similar to schema salad, it might be reasonable to translate the document schema of cwl to an open api object type definition. Then an open api library could perform at least partial validation of cwl objects.
That would greatly help promote the use of CWL as a fully-defined OGC Application Package and would help validate schema received by the API.
Good to know @fmigneault! I've created https://github.com/common-workflow-language/schema_salad/issues/517 to track the OpenAPI for CWL idea. We can flesh out a design there
@mr-c @fmigneault Would this OpenAPI idea have anything to do with providing an OpenAPI process description at /processes/{processId}
(negotiating OpenAPI representation)?
e.g. if the process was defined with CWL, whether it was deployed inside an OGC Application Package execution unit, or directly CWL as in the EO appkpkg BP, but also if the process was something other than CWL entirely?
@jerstlouis My understanding is that the request is for generating an OpenAPI schema for an inline CWL description itself, and not for generating an OpenAPI schema for the valid inputs for a particular CWL description (though that should be doable as a separate effort, I think)
@mr-c @jerstlouis I would like to be able to have something in the form of:
paths:
/processes:
post:
requestBody:
required: true
content:
application/cwl+json:
schema:
$ref: "https://www.commonwl.org/v1.2/schemas/cwl.yml"
application/moaw+json:
schema:
$ref: "path/to/moaw.yml"
application/ogcapppkg+json:
schema:
$ref: "https://raw.githubusercontent.com/opengeospatial/ogcapi-processes/master/extensions/deploy_replace_undeploy/standard/openapi/schemas/ogcapppkg.yaml"
With ogcapppkg.yml updated with something like:
type: object
required:
- executionUnit
properties:
processDescription:
$ref: https://raw.githubusercontent.com/opengeospatial/wps-rest-binding/master/core/openapi/schemas/process.yaml
executionUnit:
type: array
items:
oneOf:
- type: object
properties:
config: [...]
- schema:
$ref: "https://www.commonwl.org/v1.2/schemas/cwl.yml"
- type: object
additionalProperties: true
@fmigneault Small fix: the /processes/{processId}
should be PUT (replace), POST is to /processes
(though I don't know if you would want to change the PUT (replace) to be at /processes/{processId}/definition
to match the GET, if those become separate resources)
SWG meeting from 2024-04-15: The SWG decided to implement the /processes/{processId}/package
endpoint to get the application package that was originally used to deploy the process.
Maybe this issue https://github.com/opengeospatial/ogcapi-processes/issues/410 was blocking this one?
Both can be closed once the PR https://github.com/opengeospatial/ogcapi-processes/pull/443 is merged.
Update part 2 to reflect that fact that you: GET a process description from
/processes/{processId}' (same terminology is used in part 1) and you POST a process **definition** to
/processes` to deploy a process. In other words a process definition is an application package (i.e. description + execution unit + possibly other stuff).Also, a new endpoint
/processes/{processId}/package
for get the process definition OR we use the same endpoint (i.e./processes/{processId}
) and use content negotiation to get the description versus the definition. There is already a MIME type for an application package so that should be straight forward.