Closed jerstlouis closed 1 year ago
We have fixed this for objects and bounding boxes. The specification of binary inputs seems to be a bit ambigous, however. This schema states, that input values should be described by "format": "byte"
. In the execute example, it states: "encoding": "base64"
. I think we need to clarify this with the SWG. For inputs by reference, we will provide a small input file along with the ETS.
Typically, format
would be used along value
to form a qualified-value {"value": "<base64-string>", "format": "byte"}
submitted in the execution request. That aligns with the process description using {"type": "string", "format": "byte"}
.
The contentEncoding
is used to represent inputs in the process description, which are typically submitted by href
form.
In this case, base64
is specific, while binary
is a common alternative defaulting to base64
, though this is not necessarily guaranteed. The server could use any default from (7bit, 8bit, base16, base32, base64).
However, as @jerstlouis mentionned, both "href"
and "value"
inputs are supposed to be supported at execution time with the current OAP specification. I would prefer if it wasn't the case (ie: value
would be reserved for type: xxx
cases in process description, and href
would be reserved for explicit contentType
/contentEncoding
input descriptions), but this is not the case currently.
Therefore, the schema
representation of this input in the process description should handle interchangeably (regardless of what was actually provided in the process description):
schema:
oneOf:
- type: string
format: byte
- contentType: xxx
contentEncoding: binary
- contentType: xxx
contentEncoding: base64
Although format: byte
is quite common, it uses an "open" field. JSON schema does not impose or restrict any value for format
. Technically, format: base64
would be valid as well, and I have seen it used many times. The contentEncoding
is somewhat more restrictive in which values it allows, and more descriptive on how to interpret them.
@fmigneault
contentMediaType
, not
contentType
( https://json-schema.org/understanding-json-schema/reference/non_json_data )contentMediaType
, I think one still needs to specify type: string
format: byte
, is this used outside of the OGC API - Processes world? Would you have any example or reference of that?Thanks.
(sorry I apparently closed this issue with some GitHub hotkey by mistake)
Indeed, you're right. It should be contentMediaType
.
I am curious as to where you ran across format: byte, is this used outside of the OGC API - Processes world? Would have example / references of that?
It is a common value employed because some docs mention it. For example, under "String Formats" in https://swagger.io/docs/specification/data-models/data-types/ (sorry, can't pin the exact header directly).
@fmigneault I believe there is some lingering confusion. I am confused by the comment above ( https://github.com/opengeospatial/ets-ogcapi-processes10/issues/57#issuecomment-1738745819 ) and in the e-mail from @bpross-52n :
The specification of binary inputs seems to be a bit ambigous, however. This schema states, that input values should be described by
"format": "byte"
This binaryInputValue schema is not saying that the input should be described by "format": "byte"
, or saying anything about how inputs should be described, it is simply hinting (as part of describing the schema for execution requests inputs, included by inputValueNoObject.yaml) that in an execution request, the binary input is bytes. This is specifically about how binary data should be encoded inside an execution request, and is not used by the process description to describe inputs or outputs. If this were purely JSON Schema, in this case this file should really say:
type: string
contentEncoding: base64
because Requirement 22 says that:
The service SHALL support binary values encoded as base64-encoded strings.
Although a server may potentially inputs encoded otherwise, base64 being the only required encoding makes it the only interoperable option. However, as you point out, "format": "byte"
is an OpenAPI thing, and possibly for OpenAPI 3.0 definition, the current "format": "byte"
is correct.
When inputs are described inside an execution request, the format
property is completely unrelated to the JSON Schema format
, and is actually an object as described in format.yaml.
So this:
{"value": "<base64-string>", "format": "byte"}
is not correct inside an execution request. It should be e.g.,
{
"value": "<base64-string>",
"format": {
"mediaType": "image/tiff;application=geotiff",
"encoding": "base64"
}
}
In a process description, binary inputs would be described like:
"inputs" : {
"data" :
{
"title" : "Some binary input",
"description" : "Some binary input data provided as a GeoTIFF or PNG",
"minOccurs" : 1,
"maxOccurs" : 1,
"schema" : {
"oneOf": [
{
"type": "string",
"contentEncoding": "binary",
"contentMediaType": "image/tiff; application=geotiff"
},
{
"type": "string",
"contentEncoding": "binary",
"contentMediaType": "image/png"
}
]
}
},
...
@jerstlouis
Your summary looks all right to me. You are again correct about the format.yaml
schema for the qualified execution input.
I think what was misleading in my understanding of the issue was that I saw format: byte
being used for describing a process input before.
Another possible cause of confusion is that ogcapppkg
also includes the executionUnit
-> qualifiedInputValue
-> inputValue
-> inputValueNoObject
-> binaryInputValue
. The schemas are valid, but given the executionUnit
somewhat "describes" the process to be deployed, it doesn't help keeping those concepts separate.
Something to validate while I noticed.
Wouldn't inputValueNoObject.yaml cause an error if a string with base64
-like characters is provided? Technically, type: string
still validates correctly regardless of format, therefore oneOf
would have 2 matches?
@fmigneault
Today we discussed how we should have a specific conformance classes like ogcappkg-docker
for what should be in the execution unit.
I think it would be better to leave executionUnit
open-ended in terms of what needs to go in there in OGC Application Package
, and define that instead in a particular conformance class (e.g., ogcapppkg-docker
, ogcapppkg-cwl
, ogcapppkg-python
).
A lot of the things in executionUnit.yaml like cpuMax
, memoryMax
seem specific to Docker / containers, and are not something you would expect to see in an application package where the execution unit is a Python script for example.
This way we could avoid that oneOf
altogether and the use of the execution request qualifiedInputValue.yaml inside the execution unit, which I agree is extremely confusing.
Wouldn't inputValueNoObject.yaml cause an error if a string with base64-like characters is provided?
You are correct, I think this was recently raised as an issue and my suggestion was to change the oneOf
to an anyOf
.
A lot of the things in executionUnit.yaml like
cpuMax
,memoryMax
seem specific to Docker / containers, and are not something you would expect to see in an application package where the execution unit is a Python script for example.
I disagree about that. One could have a heavy parallel computation with a Python script that requires minimum CPU/RAM resources to run it. I don't think this is specific to any technology.
However, the current executionUnit definition is redundant if considering ogcapppkg-cwl
. Those are all available (and even more) through:
I would therefore separate resource/operational requirements into a separate conformance class, on its own, along the various ogcapppkg-docker
, ogcapppkg-python
, etc.
change the
oneOf
to ananyOf
Indeed, that would fix it.
One could have a heavy parallel computation with a Python script that requires minimum CPU/RAM resources to run it. I don't think this is specific to any technology.
True, but this is a capability often found in container runners and the specific properties might be biased towards what Docker offers. Not sure how one would ensure this minimum or maximum resource usage with a Python script, you would likely need to automatically dockerize the Python script, as opposed to running it directly with python? (which might be a good thing to do for security reason)
However, the current executionUnit definition is redundant if considering ogcapppkg-cwl.
Yes, that was also my thought and why I was suggesting that.
I would therefore separate resource/operational requirements into a separate conformance class, on its own, along the various ogcapppkg-docker, ogcapppkg-python, etc.
That might be the way.
I don't necessarily see it as specific to python or docker (in the sense that it should be that specific process that controls these resources), but rather, as requirements a machine should meet to run said python/docker execution unit. The server could take some actions upstream such as spawning a different kubernetes node to meet requirements, or forward the request to another worker node with more capabilities. Once these requirements are met, the python/docker/else would simply run on that node without worrying about how to manage the resources.
In some cases, it can also be relevant to pass resource requirements for the process itself. For example, the cwltool:CUDARequirement
I mentioned can indicate a number of required GPUs. This value can be passed to pytorch
for example (using utilities form pytorch.cuda
), to allocate specific hardware to the script.
I agree that it would be more secure to run in docker, but that is an implementation specific choice. And even when running in docker, the machine it runs on must meet the minimum resource requirements. Therefore, it could still be not up to docker to resolve these parameters upstream to determine which resource/worker node must run the job.
@fmigneault
Right, as well as requirements, the "max" CPU / memory can also be a way to prevent using up more resources than specified for a particular process.
@bpross-52n Please upload the GeoTIFF to this GitHub repository and use it as a remote resource in the test suite afterwards.
Describe the bug The ETS reports failures on:
but none of the execution requests it sends actually include a binary input, a bbox input, an input by reference, or an input complex object (other than an invalid measure input as reported in #56). No
href
is used,imagesInput
is passed simply"teststring"
, whileboundingBoxInput
andcomplexObjectInput
are never set. Nothing testsfeatureCollectionInput
orgeometryInput
either.Expected behavior The ETS should eventually actually test these different capabilities properly, and not report confusing status on these tests until it does.
"href" :
and"value": { }
syntax,bbox
should be tested for inputs specifyingogc-bbox
format, and"contentEncoding": "binary"
inputs (with proper media type e.g., ETS should include at least a sample small GeoTIFF file).