contentEncoding and contentMediaType should be moved under string schema

egekorkan commented 3 years ago

While not being 100% sure, I think that contentEncoding and contentMediaType should be moved to strings. They do not make sense for an object or array. Maybe number or integer can be used but I doubt it. JSON Schema spec is slightly ambiguous about it but all the examples are with strings

https://tools.ietf.org/html/draft-handrews-json-schema-validation-01#section-8

danielpeintner commented 3 years ago

Mhh, not sure if I can agree.

contentEncoding can be used to indicate compression like gzip. Hence I assume this makes sense for arrays/objects and other types also.

relu91 commented 3 years ago

contentEncoding can be used to indicate compression like gzip. Hence I assume this makes sense for arrays/objects and other types also.

For this purpose don't we already have form.contentCoding and form.contentType ? When I proposed the addition of contentEncoding and contentMediaType I was talking about StringSchema. Not sure how it ended up there.

For context see #912. If we look at the PR that introduced those new terms there were added in StringSchema at least in the TD ontology (see the rendered version). Maybe another render script weird bug?

sebastiankb commented 3 years ago

@handrews can you help us here? Would be great.

handrews commented 3 years ago

@sebastiankb The explanation for these keywords is much more clear in the current draft, which as far as these keywords are concerned should just be regarded as a clarification. Obviously you would not want to cite a different draft, but it should make the intent more clear.

We are more explicit about what sort of values are expected for contentEncoding, specifically encodings from RFC 4648 and parts of 2045 that produce strings as output. It's derived from MIME's Content-Transfer-Encoding and not HTTP's Content-Encoding, so gzip is not an expected value. This is because these keywords are about encoding other formats into a JSON string, so if the instance is JSON and not a string, then the content* keywords are ignored (this is independent from the type keyword).

We also added more examples.

Note that using a contentMediaType of application/json (which I see you mention in the other issue) would be done for something like:

{
   "someJSON": "{\"foo\": 42, \"bar\": \"stuff\"}"
}

with a schema of

{
    "type": "string",
    "contentMediaType": "application/json"
}

In more recent drafts, the contentSchema keyword was added to allow supplying a schema for such string-embedded content, although strings described by contentEncoding, contentMediaType, and contentSchema are not automatically decoded, parsed, and validated due to security concerns over, say, parsing arbitrary embedded application/javascript 😬

In part to support OpenAPI 3.1 usage with API payloads of various content types, we also added a note on using JSON Schema with non-JSON-data-model-compatible media types. I'm not sure how similar this is to your DataSchema concern, but if you have actual binary instance data (e.g. not encoded into a JSON or other string), you might find a use for the content* keywords with non-string content. We left this possibility rather vague. Harmonizing this area with OAS 3.1's existing Media Type Object and Encoding Object was a bit mind-bending and I don't remember exactly what we decided. If it matters to you I can try to refresh my memory.

It may be that if you end up applying a JSON Schema to binary gzipped data that is not string-encoded into an actual JSON document, doing something with these keywords could be useful.

w3c / wot-thing-description

contentEncoding and contentMediaType should be moved under string schema #1116