Reported test failures for Ref, BBOX, Binary do not test what they claim

jerstlouis commented 1 year ago

Describe the bug The ETS reports failures on:

Input Inline Binary
Inline Bbox
Input Ref
Inline Object

but none of the execution requests it sends actually include a binary input, a bbox input, an input by reference, or an input complex object (other than an invalid measure input as reported in #56). No href is used, imagesInput is passed simply "teststring", while boundingBoxInput and complexObjectInput are never set. Nothing tests featureCollectionInput or geometryInput either.

Expected behavior The ETS should eventually actually test these different capabilities properly, and not report confusing status on these tests until it does.

All inputs must support "href" : and "value": { } syntax,
bbox should be tested for inputs specifying ogc-bbox format, and
inline binary should be tested for "contentEncoding": "binary" inputs (with proper media type e.g., ETS should include at least a sample small GeoTIFF file).

bpross-52n commented 1 year ago

We have fixed this for objects and bounding boxes. The specification of binary inputs seems to be a bit ambigous, however. This schema states, that input values should be described by "format": "byte". In the execute example, it states: "encoding": "base64". I think we need to clarify this with the SWG. For inputs by reference, we will provide a small input file along with the ETS.

fmigneault commented 1 year ago

Typically, format would be used along value to form a qualified-value {"value": "<base64-string>", "format": "byte"} submitted in the execution request. That aligns with the process description using {"type": "string", "format": "byte"}.

The contentEncoding is used to represent inputs in the process description, which are typically submitted by href form. In this case, base64 is specific, while binary is a common alternative defaulting to base64, though this is not necessarily guaranteed. The server could use any default from (7bit, 8bit, base16, base32, base64).

However, as @jerstlouis mentionned, both "href" and "value" inputs are supposed to be supported at execution time with the current OAP specification. I would prefer if it wasn't the case (ie: value would be reserved for type: xxx cases in process description, and href would be reserved for explicit contentType/contentEncoding input descriptions), but this is not the case currently.

Therefore, the schema representation of this input in the process description should handle interchangeably (regardless of what was actually provided in the process description):

schema:
  oneOf:
    - type: string
      format: byte
    - contentType: xxx
      contentEncoding: binary
    - contentType: xxx
      contentEncoding: base64

Although format: byte is quite common, it uses an "open" field. JSON schema does not impose or restrict any value for format. Technically, format: base64 would be valid as well, and I have seen it used many times. The contentEncoding is somewhat more restrictive in which values it allows, and more descriptive on how to interpret them.

jerstlouis commented 1 year ago

@fmigneault

I believe the correct property is contentMediaType, not contentType ( https://json-schema.org/understanding-json-schema/reference/non_json_data )
Even when using contentMediaType, I think one still needs to specify type: string
I am curious as to where you ran across format: byte, is this used outside of the OGC API - Processes world? Would you have any example or reference of that?

Thanks.

(sorry I apparently closed this issue with some GitHub hotkey by mistake)

fmigneault commented 1 year ago

Indeed, you're right. It should be contentMediaType.

I am curious as to where you ran across format: byte, is this used outside of the OGC API - Processes world? Would have example / references of that?

It is a common value employed because some docs mention it. For example, under "String Formats" in https://swagger.io/docs/specification/data-models/data-types/ (sorry, can't pin the exact header directly).

jerstlouis commented 1 year ago

@fmigneault I believe there is some lingering confusion. I am confused by the comment above ( https://github.com/opengeospatial/ets-ogcapi-processes10/issues/57#issuecomment-1738745819 ) and in the e-mail from @bpross-52n :

The specification of binary inputs seems to be a bit ambigous, however. This schema states, that input values should be described by "format": "byte"

This binaryInputValue schema is not saying that the input should be described by "format": "byte", or saying anything about how inputs should be described, it is simply hinting (as part of describing the schema for execution requests inputs, included by inputValueNoObject.yaml) that in an execution request, the binary input is bytes. This is specifically about how binary data should be encoded inside an execution request, and is not used by the process description to describe inputs or outputs. If this were purely JSON Schema, in this case this file should really say:

type: string
contentEncoding: base64

because Requirement 22 says that:

The service SHALL support binary values encoded as base64-encoded strings.

Although a server may potentially inputs encoded otherwise, base64 being the only required encoding makes it the only interoperable option. However, as you point out, "format": "byte" is an OpenAPI thing, and possibly for OpenAPI 3.0 definition, the current "format": "byte" is correct.

When inputs are described inside an execution request, the format property is completely unrelated to the JSON Schema format, and is actually an object as described in format.yaml.

So this:

{"value": "<base64-string>", "format": "byte"}

is not correct inside an execution request. It should be e.g.,

{
   "value": "<base64-string>",
   "format": {
      "mediaType": "image/tiff;application=geotiff",
      "encoding": "base64"
   }
}

In a process description, binary inputs would be described like:

 "inputs" : {
    "data" :
    {
      "title" : "Some binary input",
      "description" : "Some binary input data provided as a GeoTIFF or PNG",
      "minOccurs" : 1,
      "maxOccurs" : 1,
      "schema" : {
          "oneOf": [
          {
            "type": "string",
            "contentEncoding": "binary",
            "contentMediaType": "image/tiff; application=geotiff"
          },
          {
            "type": "string",
            "contentEncoding": "binary",
            "contentMediaType": "image/png"
          }
        ]
      }
    },
   ...

fmigneault commented 1 year ago

@jerstlouis Your summary looks all right to me. You are again correct about the format.yaml schema for the qualified execution input. I think what was misleading in my understanding of the issue was that I saw format: byte being used for describing a process input before.

Another possible cause of confusion is that ogcapppkg also includes the executionUnit -> qualifiedInputValue -> inputValue -> inputValueNoObject -> binaryInputValue. The schemas are valid, but given the executionUnit somewhat "describes" the process to be deployed, it doesn't help keeping those concepts separate.

Something to validate while I noticed. Wouldn't inputValueNoObject.yaml cause an error if a string with base64-like characters is provided? Technically, type: string still validates correctly regardless of format, therefore oneOf would have 2 matches?

jerstlouis commented 1 year ago

@fmigneault

Today we discussed how we should have a specific conformance classes like ogcappkg-docker for what should be in the execution unit.

I think it would be better to leave executionUnit open-ended in terms of what needs to go in there in OGC Application Package, and define that instead in a particular conformance class (e.g., ogcapppkg-docker, ogcapppkg-cwl, ogcapppkg-python).

A lot of the things in executionUnit.yaml like cpuMax, memoryMax seem specific to Docker / containers, and are not something you would expect to see in an application package where the execution unit is a Python script for example.

This way we could avoid that oneOf altogether and the use of the execution request qualifiedInputValue.yaml inside the execution unit, which I agree is extremely confusing.

Wouldn't inputValueNoObject.yaml cause an error if a string with base64-like characters is provided?

You are correct, I think this was recently raised as an issue and my suggestion was to change the oneOf to an anyOf.

fmigneault commented 1 year ago

A lot of the things in executionUnit.yaml like cpuMax, memoryMax seem specific to Docker / containers, and are not something you would expect to see in an application package where the execution unit is a Python script for example.

I disagree about that. One could have a heavy parallel computation with a Python script that requires minimum CPU/RAM resources to run it. I don't think this is specific to any technology.

However, the current executionUnit definition is redundant if considering ogcapppkg-cwl. Those are all available (and even more) through:

I would therefore separate resource/operational requirements into a separate conformance class, on its own, along the various ogcapppkg-docker, ogcapppkg-python, etc.

change the oneOf to an anyOf

Indeed, that would fix it.

jerstlouis commented 1 year ago

One could have a heavy parallel computation with a Python script that requires minimum CPU/RAM resources to run it. I don't think this is specific to any technology.

True, but this is a capability often found in container runners and the specific properties might be biased towards what Docker offers. Not sure how one would ensure this minimum or maximum resource usage with a Python script, you would likely need to automatically dockerize the Python script, as opposed to running it directly with python? (which might be a good thing to do for security reason)

However, the current executionUnit definition is redundant if considering ogcapppkg-cwl.

Yes, that was also my thought and why I was suggesting that.

I would therefore separate resource/operational requirements into a separate conformance class, on its own, along the various ogcapppkg-docker, ogcapppkg-python, etc.

That might be the way.

fmigneault commented 1 year ago

I don't necessarily see it as specific to python or docker (in the sense that it should be that specific process that controls these resources), but rather, as requirements a machine should meet to run said python/docker execution unit. The server could take some actions upstream such as spawning a different kubernetes node to meet requirements, or forward the request to another worker node with more capabilities. Once these requirements are met, the python/docker/else would simply run on that node without worrying about how to manage the resources.

In some cases, it can also be relevant to pass resource requirements for the process itself. For example, the cwltool:CUDARequirement I mentioned can indicate a number of required GPUs. This value can be passed to pytorch for example (using utilities form pytorch.cuda), to allocate specific hardware to the script.

I agree that it would be more secure to run in docker, but that is an implementation specific choice. And even when running in docker, the machine it runs on must meet the minimum resource requirements. Therefore, it could still be not up to docker to resolve these parameters upstream to determine which resource/worker node must run the job.

jerstlouis commented 1 year ago

@fmigneault

Right, as well as requirements, the "max" CPU / memory can also be a way to prevent using up more resources than specified for a particular process.

dstenger commented 1 year ago

@bpross-52n Please upload the GeoTIFF to this GitHub repository and use it as a remote resource in the test suite afterwards.

opengeospatial / ets-ogcapi-processes10

Reported test failures for Ref, BBOX, Binary do not test what they claim #57