opengeospatial / ogcapi-processes

https://ogcapi.ogc.org/processes
Other
48 stars 45 forks source link

Update Part 2 with the consistent use of process "definition" and process "description" #282

Closed pvretano closed 1 week ago

pvretano commented 2 years ago

Update part 2 to reflect that fact that you: GET a process description from /processes/{processId}' (same terminology is used in part 1) and you POST a process **definition** to/processes` to deploy a process. In other words a process definition is an application package (i.e. description + execution unit + possibly other stuff).

Also, a new endpoint /processes/{processId}/package for get the process definition OR we use the same endpoint (i.e. /processes/{processId}) and use content negotiation to get the description versus the definition. There is already a MIME type for an application package so that should be straight forward.

jerstlouis commented 2 years ago

@pvretano At the moment, the OGC API Application Package is only one potential language/format for the process definition (OGC Application Package is a separate conformance class).

If we keep it that way, neither of those options (calling it /package or apppkg media type) seem appropriate.

I feel that the definition and description are different enough that they should be separate resources, especially given the amount of confusion that mixing those up generate.

I would prefer GET /processes/{processId}/definition or GET /processes/{processId}/executionUnit. I would have suggested GET /processes/{processId}/execution if we were not already planning execution via GET.

pvretano commented 2 years ago

@jerstlouis application pkg as defined in PART 2 is NOT MANDATORY. If it suited to your purpose it is recommended. Other language/formats are possible and if another one emerges (e.g. CWL or MOAW) we can add that as another recommendation.

jerstlouis commented 2 years ago

@pvretano Yes exactly, that's what I meant.

cportele commented 2 years ago

I am not able to follow the Processes discussion that is happening, but I saw the issue and think it is strange to POST a representation of a resource of type A (process definition) and when you GET the resource that has been created you get a representation of a different resource type B (process description). This doesn't sounds like HTTP to me. I am probably missing the necessary context, so feel free to ignore...

fmigneault commented 2 years ago

I'm fine with any of:

I will redirect all of these to whichever gets decided. If possible, I prefer to have path components over headers/queries.

It is important to ensure Accept/f are required if /processes/{processId} is used for POST/PUT.

fmigneault commented 2 years ago

@cportele It is mostly to preserve backward compatibility with Core. Deployed process or builtin are all obtained through GET /processes/{processId}. Any extra information needed to define the process can be POSTed as well, but not mandatory. It doesn't make sense to POST to another location, because the executionUnit has no reason to exist if not nested under the process it defines.

jerstlouis commented 2 years ago

Of those @fmigneault is happy with, I only like GET /processes/{processId}/executionUnit because deployment is not limited to application packages, so calling the definition 'package' or relying on the ogcapppkg media type to distinguish description vs. definition doesn't work for other definitions.

I agree with @cportele about the mismatched GET (description) vs. POST/PUT (definition), but I am not sure how we could solve this. One possibility might be to only support PUT at /processes/{processId}/definition to deploy a process? (but that is also odd if /processes/{processId} does not exist yet)

jerstlouis commented 2 years ago

@fmigneault

Any extra information needed to define the process can be POSTed as well, but not mandatory.

I still argue that the description explaining how a process should be invoked is very different from instructions defining the process :)

pvretano commented 2 years ago

Lets step back for a minute.

In my mind the there is no difference between a process description and a process definition! They are simply two different representations of the same resource ... the process. One happens to include more information than the other. So, POST'ing to /processes/{processId} to add a new process seems consistent to me. Its just that a process description does not contain enough information for the server to deploy a working process.

Also, I am starting to wonder whether we actually need a new endpoint from which we get the process definition. If we agree that a process description and a process definition are simply two different representations of the same resource than content negotiation should be sufficient to get one or the other.

From my server you can do a GET /process/{processId}?f=application/ogcapppkg+json and you will get the application package of the process rather than the description. If you set f=json you get the description ... which is also the default as per the core.

jerstlouis commented 2 years ago

If we agree that a process description and a process definition are simply two different representations

I disagree, in my mind there is a fundamental difference, just like a C function prototype vs. a C function body.

pvretano commented 2 years ago

But a C prototype and a C function represent the same thing ... the function ... and, again, one includes more information about the function that the other.

pvretano commented 2 years ago

One other point ... since the very beginning of deploying processes to an OAProc deployment the process definition has been referred to as an "application package" and so I think we need to respect that terminology. So the endpoint name for getting the definition ... if in fact it needs to be a separate endpoint ... should have a name related to that legacy. So, /processes/{processId}/package or /processes/{processId}/apppkg or something like that ...

jerstlouis commented 2 years ago

@pvretano Wasn't that specifically in the context of the OGC Application Package though (which is the only example I am aware of implemented with process deployment so far)? One possibility too is to merge the Application Package into Core, leaving the flexibility of how the executionUnit is defined (e.g. CWL, MOAW workflow, OpenEO). But I am worried that this would prevent the ability to upload a binary component directly as the deploy payload, e.g. a Jupyter notebook or docker inside a ZIP, as opposed to linking to something available from a URL.

jerstlouis commented 2 years ago

@pvretano

But a C prototype and a C function represent the same thing ... the function ... and, again, one includes more information about the function that the other.

True... (That is one way of seeing it) But for example, I may want to retrieve:

Each of these would need its own media type if it is a single resource.

pvretano commented 2 years ago

I don't see a problem here especially if it means the interface less cluttered.

Some of these, I believe, already have media types defined (CWL application/cwl+json, OpenAPI application/openapi+json).

I don't really see deploying a processes with a CWL execution unit and then asking the server to return that with a MOAW execution unit so as long the application package itself has a place to identify the type of execution unit (which it does) we only need one media type for an application packages ... perhaps parameterized to return the prototype, the body or both. There is a media type defined in Part 2 application/ogcappkg+json and we can add a parameter named content to get the proto, the body or the proto+body.

As for MOAW and OpenEO DAG, those should probably have media types defined ... no?

jerstlouis commented 2 years ago

@pvretano As you say the apppkg media type could still only support returning the original or native flavor of the execution unit, while you could use the execution unit media types to return only that part directly in different flavors.

Or the ogcappkg media type could also potentially have a parameter defined for the execution unit format, e.g. application/ogcappkg+json; executionType=application/cwl+json.

as for MOAW and OpenEO DAG, those should probably have media types defined ... no?

Agreed :) The MOAW media type could also be the same as for a generic execution request (those POSTed to /execution).

pvretano commented 2 years ago

@jerstlouis The form of the media type is open for discussion but do you really expect someone to deploy a process using a CWL execution unit and then asking the server to return "application/moaw+json"? If not then saying executionType=application/cwl+json is irrelevant ... no?

jerstlouis commented 2 years ago

@pvretano It does seem that for some of the scenarios in #279, some processing engines would be able to translate between MOAW / OpenEO / CWL.

e.g. deploying an apppkg containing MOAW executionUnit and generating CWL to run it.

pvretano commented 2 years ago

Yeah, I suppose that if the execution unit of a process is a scripted workflow then a server being able to translate from one to the other might be possible. If, however, the execution unit was a Docker container with a binary executable then that type of translation would be really, really hard if not impossible. Generally, however, I think that a process would be deployed with an execution unit of one type and that would be the type you get when you request the body or execution unit of the process. I would consider translating from one execution unit type to another as being an edge case and something that we could not necessarily document in the standard.

I keep pushing the use of the OPTIONS method without success but OPTIONS /processes would tell you which media type you can request from the endpoint and if your server supported translating between execution unit types you would see media type for CWL, MOAW, OpenEO DAG or whatever your server supports.

fmigneault commented 2 years ago

I also think process description and process definition are different entities. The first is like metadata about the process while the second details how to run it. But, even though they represent different portion of the process as a whole, they both don't make any sense if separated in the context of a deployed process. The distinction becomes clearer IMO because there are multiple ways to represent the definition, but the process description should always be the same schema, just like those 2 components are properly represented under 2 separate fields processDescription and executionUnit when deploying.

@jerstlouis Please don't consider it is "the only example of implemented with process deployment so far". There are a large group of people that worked on this and based their implementation around this design. We must maintain backward compatibility as much as possible. You would not like it also if Part 3 was fully integrated and then some components where modified or dropped afterwards.

I don't want a major 2.0 just for moving stuff around in a deploy payload. We can work with processDescription and executionUnit as it already is defined for "OGC Application Package". That should remain the same. This goes as well for the extra PUT request. It breaks existing implementations and makes deployment even more convoluted then it already is. My implementation is able to patch missing parts in both processDescription and executionUnit when it can infer details from the other, so I need both at the same time. If an implementation wants to deploy some different content (eg: Jupyter notebook, zip, bin, ...), it can still do so using POST with a different Content-Type of the desired format.

I like the idea of returning different representations of /processes/{processId}/package using application/cwl+json, application/ogcappkg+json or application/moaw+json. That could allow interoperability between servers that have at least 1 overlapping representation they both support. My server would at least support the first 2, and potentially the 3rd.

OPTIONS /processes would tell you which media type you can request from the endpoint

That would be nice. Definitely the cleanest approach.

jerstlouis commented 2 years ago

@fmigneault

Please don't consider it is "the only example of implemented with process deployment so far"

I think you misunderstood what I meant by that, but I stand corrected because Peter clarified that the EO Application Package Best Practice actually does NOT use the OGC Application Package -- I thought it did. Rather it is an example of POSTing CWL directly as the Content-Type, if I understand correctly.

We must maintain backward compatibility as much as possible.

I never suggested otherwise :)

I don't want a major 2.0 just for moving stuff around in a deploy payload.

OGC API - Processes - Part 2 1.0 has not even been published yet...

/processes/{processId}/package

I don't feel strongly about this (so it's fine with me if that is the consensus), but it seems to me that the name is not intuitive to refer to the process definition.

I am also already confused with what an "application package" is given that the EO Application Package Best Practice is directly CWL, whereas the OGC Application Package contains a process description + an execution which can be different things (CWL among those).

As I mentioned previously, I would also recommend to define a new Application Package format (application/ogcappkg+zip?) as a compressed archive that can include a process description, execution unit, and potentially other files like a docker or Jupyter notebook or binary executable that can be directly embedded within the package itself -- that sounds more like an actual (self-contained) "package" to me.

NOTE: I am happy if we clarify that an "application package" is the same as the "process definition", and that "OGC JSON Application Package" (application/ogcapppkg+json) is one specific format of that. /package then makes more sense.

It might make sense to also include the EO Application Package (or could that just be CWL Application Package?) and maybe this new OGC Zipped Application Package as other conformance classes of Part 2?

fmigneault commented 2 years ago

@pvretano @jerstlouis

I am also already confused with what an "application package" is given that the EO Application Package Best Practice is directly CWL, whereas the OGC Application Package contains a process description + an execution which can be different things (CWL among those).

You are not the only one 😅 Best Practices mixes things in its own documentation. It seems to refer to a similar format as "OGC Application Package":

https://docs.ogc.org/bp/20-089r1.html#toc16

The Application Package includes the following information:

  • Reference to the executable block that implements the Application functionality
  • Description of its input/output interface

i.e.: represented in the executionUnit and the processDescription respectively.

Even some sections somewhat contradicts themselves:

https://docs.ogc.org/bp/20-089r1.html#toc17

This best practice focuses on the scenario where an application is directly packaged as an Application Package, registered in a Platform and made available as an implementation of OGC API — Processes. [...] Validate and accept an application deployment request (OGC API — Processes Transaction Extension)

So, deploy directly ~OGC~ "Application Package" (ie: CWL), but validate with Transactions, a.k.a: OGC API - Processes - Part 2: Deploy, Replace, Undeploy before it was renamed.

I think adding application/ogcapppkg+json and application/cwl+json (and others of course like the zip) will resolve these ambiguities once and for all.

pvretano commented 2 years ago

I really hate to do this, having argued previously for the name /processes/{processId}/package but I think I now agree with @jerstlouis that the name is maybe less intuitive than it could be. The name /processes/{processId}/definition might be a better match for the way we currently view the relationship between the description and the definition of a process. Thoughts?

p3dr0 commented 2 years ago
jerstlouis commented 2 years ago

@p3dr0

20-089r1 documents how CWL can be THE encoding of the OGC Application Package

This is where things get confusing, because currently Part 2 describes an encoding and conformance class called the "OGC Application Package" (application/ogcapppkg+json) specifically as a JSON encoding of a particular schema including processDescription and executionUnit properties, and this encoding can support different types of execution units (CWL being one of them), which is a part of the overall OGC Application Package.

From the CWL document we are able to obtain the process description (input/output interface) and execute the application (whatever might be)

Yes it makes sense to be able to infer the process description from the process definition execution unit (e.g. CWL).

ZIP files are archives that store multiple files

Yes, I suggest a ZIP-based application package type, with one or more files with pre-determined names inside whose content follow a particular schema, like several archive-based package formats e.g. .deb, .cab etc., because it would allow to deploy fully self-contained packages that do not require external references (all required resources can be embedded within the archive), avoiding the need to e.g. maintain a separate docker repository, or requiring a separate authentication mechanism when referencing private resources.

Would it make sense to refer to the 20-089r1 application packages simply as CWL process definition (or maybe CWL application package) and use the application/cwl+json Content-Type when deploying them directly via POST to /processes?

p3dr0 commented 2 years ago
fmigneault commented 2 years ago

@pvretano I can live with that modification (/processes/{processId}/definition) with support to return previously mentioned different content-types.

@mr-c Is there an OpenAPI YAML schema location of the top-level CWL definition we could refer to in order to directly reference it to properly define what is proposed in #258?

mr-c commented 2 years ago

@mr-c Is there an OpenAPI YAML schema location of the top-level CWL definition we could refer to in order to directly reference it to properly define what is proposed in #258?

Identifiers for particular versions of the CWL standards have the form https://w3id.org/cwl/v1.2/

https://w3id.org/cwl/v1.2/cwl.ttl is an RDF description of CWL concepts

There is also https://w3id.org/cwl/v1.2/cwl-context.json

An OpenAPI YAML Schema doesn't make sense to me because CWL is a serialization format, not an API. Maybe @tetron or @stain have other thoughts on the topic?

So I would recommend the use of https://w3id.org/cwl/v1.2/ to identify the expected content.

If it would be a better fit, then we can ask for a CWL identifier in the IANA media types database (previously know as MIME types).

tetron commented 2 years ago

There's a data definition part of openapi which I believe is quite similar to schema salad, it might be reasonable to translate the document schema of cwl to an open api object type definition. Then an open api library could perform at least partial validation of cwl objects.

fmigneault commented 2 years ago

That would greatly help promote the use of CWL as a fully-defined OGC Application Package and would help validate schema received by the API.

mr-c commented 2 years ago

Good to know @fmigneault! I've created https://github.com/common-workflow-language/schema_salad/issues/517 to track the OpenAPI for CWL idea. We can flesh out a design there

jerstlouis commented 2 years ago

@mr-c @fmigneault Would this OpenAPI idea have anything to do with providing an OpenAPI process description at /processes/{processId} (negotiating OpenAPI representation)? e.g. if the process was defined with CWL, whether it was deployed inside an OGC Application Package execution unit, or directly CWL as in the EO appkpkg BP, but also if the process was something other than CWL entirely?

mr-c commented 2 years ago

@jerstlouis My understanding is that the request is for generating an OpenAPI schema for an inline CWL description itself, and not for generating an OpenAPI schema for the valid inputs for a particular CWL description (though that should be doable as a separate effort, I think)

fmigneault commented 2 years ago

@mr-c @jerstlouis I would like to be able to have something in the form of:

paths:
  /processes:
    post:
      requestBody:
        required: true
        content:
          application/cwl+json:
            schema:
              $ref: "https://www.commonwl.org/v1.2/schemas/cwl.yml"
          application/moaw+json:
            schema:
              $ref: "path/to/moaw.yml"
          application/ogcapppkg+json:
            schema:
              $ref: "https://raw.githubusercontent.com/opengeospatial/ogcapi-processes/master/extensions/deploy_replace_undeploy/standard/openapi/schemas/ogcapppkg.yaml"

With ogcapppkg.yml updated with something like:

type: object
required:
  - executionUnit
properties:
  processDescription:
    $ref: https://raw.githubusercontent.com/opengeospatial/wps-rest-binding/master/core/openapi/schemas/process.yaml
  executionUnit:
    type: array
    items:
      oneOf:
        - type: object
          properties:
            config: [...]
        - schema:
            $ref: "https://www.commonwl.org/v1.2/schemas/cwl.yml"
        - type: object
          additionalProperties: true
jerstlouis commented 2 years ago

@fmigneault Small fix: the /processes/{processId} should be PUT (replace), POST is to /processes (though I don't know if you would want to change the PUT (replace) to be at /processes/{processId}/definition to match the GET, if those become separate resources)

bpross-52n commented 6 months ago

SWG meeting from 2024-04-15: The SWG decided to implement the /processes/{processId}/package endpoint to get the application package that was originally used to deploy the process.

gfenoy commented 1 month ago

Maybe this issue https://github.com/opengeospatial/ogcapi-processes/issues/410 was blocking this one?

Both can be closed once the PR https://github.com/opengeospatial/ogcapi-processes/pull/443 is merged.