opengeospatial / ogcapi-processes

https://ogcapi.ogc.org/processes
Other
48 stars 45 forks source link

The execution unit schema is really confusing! #397

Open pvretano opened 8 months ago

pvretano commented 8 months ago

I have been reviewing the Part 2 and the schema for the execution unit is really disjoint and confusing.

If the execution unit is a container then the execution unit looks like this:

"executionUnit": {
  "type": "docker",
  "image": "mydocker/ndvi:latest",
  "deployment": "local",
  "config": {
    "cpuMin": 2,
    "cpuMax": 5,
    "memoryMin": 1,
    "memoryMax": 3
  }
}

If, however, the execution unit is CWL, the execution unit looks like this:

  "executionUnit": {
    "href": "https://raw.githubusercontent.com/EOEPCA/app-snuggs/main/app-package.cwl",
    "type": "application/cwl"
  }

or like this depending on whether the CWL is inline or referenced:

{
  "executionUnit": {
    "value": { ... },
    "mediaType": "application/cwl+json"
  }
}

Why such varying schemas? Why, for example, does the CWL class have both "type" and "mediaType" when the examples are both media types? Why not have a single member, type, of type string and simply have a requirement in the CWL conformance class that it's value has to be a media type?

Also, why does the docker conformance class have a member image? Why not call it value to be consistent and simply include a requirement is the docker conformance class that the value of value is a Docker/OCI image reference? The docker conformance class could further extend a "base" schema to add the deployment and config members.

If, for example, the base schema of the execution unit was:

type: object
required:
  - type
  - value
properties:
  type:
    type: string
  value:
    type: string

The docker conformance class could extend this schema like this:

allOf:
  - $ref: executionUnit.yaml
  - type: object
    properties:
      type:
        type: string
        unum:
          - docker
          - oci
      deployment:
        type: string
        enum:
          - local
          - remote
          - hpc
          - cloud
      config:
        type: object
        properties:
          cpuMin:
            type: number
            minimum: 1
          cpuMax:
            type: number
          memoryMin:
            type: number
          memoryMax:
            type: number
          storageTempMin:
            type: number
          storageOutputsMin:
            type: number
          jobTimeout:
            type: number

And the CWL conformance class could extend this schema like this:

allOf:
  - $ref: executionUnit.yaml
  - type: object
    properties:
      value:
        oneOf:
          - type: string
          - type: object

... and, of course, a vendor-specific execution unit such as:

"executionUnit": {
  "type": "python",
  "value": "... my python program text ..."
}

is also schema valid allowing experimention to flourish without too much trouble and perhaps lead to additional execution unit conformance classes.

fmigneault commented 8 months ago

Why such varying schemas? Why, for example, does the CWL class have both "type" and "mediaType" when the examples are both media types? Why not have a single member, type, of type string and simply have a requirement in the CWL conformance class that it's value has to be a media type?

I think the intention was to follow the "by-reference" type/href for a link, and took inspiration from the qualifiedInputValue for the plain representation, which uses format with mediaType, encoding and schema fields.

Also, why does the docker conformance class have a member image? Why not call it value to be consistent and simply include a requirement is the docker conformance class that the value of value is a Docker/OCI image reference?

This looks like it takes inspiration from docker compose image field:

services:
  ndvi:
    container_name: nvdvi
    image: mydocker/ndvi:latest
    environment:
      FORWARDED_ALLOW_IPS: "*"

I agree with the use of image personally. It is much more explicit than a generic value field, especially if "type": "docker" is mentioned.

pvretano commented 6 months ago

@fmigneault I still think there should be a consistent way to, at least, identify the type of execution unit. Right now we have three members for this: "type" (as a plain token), "type" as a media type, and "mediaType" as a mediaType. I would prefer that we have "type" as an enumerated token to identify type (each execution unit conformance class would expand the enumeration). Then the rest of the members of the executionUnit object can be specific to the type of execution unit. If the execution unit is a reference then lets just use a proper link for that. So we would have ...

Docker execution unit:

"executionUnit": {
  "type": "docker",
  "image": "mydocker/ndvi:latest",
  "deployment": "local",
  "config": {
    "cpuMin": 2,
    "cpuMax": 5,
    "memoryMin": 1,
    "memoryMax": 3
  }
}

CWL by reference:

"executionUnit": {
  "type": "cwl",
  "link": {
     "href": "https://raw.githubusercontent.com/EOEPCA/app-snuggs/main/app-package.cwl",
     "type": "application/cwl"
  }
}

CWL inline

"executionUnit": {
  "type": "cwl",
  "value": { ... cwl JSON ... }
}
bpross-52n commented 6 months ago

SWG meeting from April 29th: Check if there was any decision made in March.

bpross-52n commented 6 months ago

SWG meeting from April 29th: Check if there was any decision made in March.

During the SWG meeting from March 4th, this was also discussed. It was agreed that Peter will create a PR for the SWG to review. You can find the recording here: (OGC member only) https://portal.ogc.org/files/?artifact_id=108046 Interesting part is around minute 32.

fmigneault commented 6 months ago

@pvretano I wouldn't mind having type: docker|cwl, with value or link for the inline vs by-ref representations. However, to keep this consistent, the contents of the type: docker should also be nested under value. Otherwise, it somewhat goes against the point of aligning the definitions.

Also, the type field cannot be a closed enum, since that would limit users from using their own representation. The schema would have to be something like the following :

oneOf:
  - allOf:
    - type: object
      properties:
        type:
          enum:
          - docker
          - oci
    - oneOf:
      - type: object
        properties:
          value: 
            $ref: executionUnit.yml   # with the 'type' property removed
        additionalProperties: false

      - type: object
        properties:
          link:
            $ref: link.yml
        additionalProperties: false

  - allOf:
    - type: object
      properties:
        type:
          enum:
          - cwl
    - oneOf:
      - type: object
        properties:
          value: 
            $ref: "https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/json-schema/cwl.yaml"
        additionalProperties: false

      - type: object
        properties:
          link:
            $ref: link.yml
        additionalProperties: false

  - allOf:
    - type: object
      properties:
        type:
          # if we want to be strict, here we should also include a 'not' of all other 'type'
          type: string

    - oneOf:
      - type: object
        properties:
          value: 
            type: object
            additionalProperties: true

      - type: object
        properties:
          link:
            $ref: link.yml
        additionalProperties: false