Clarify InteractionAffordance::contentType vs DataSchema::type

zolkis commented 4 years ago

Coming from Scripting issue 198, should we clarify the following:

The TD spec should contain normative prose on how to represent binary stream data. What is the DataSchema::type of a Property when it contains a form with contentType "image/jpeg" for instance? We need normative prose on how DataSchema::type and InteractionAffordance::contentType should be used together, or separately.
The TD doc can be serialized only as UTF-8 (without BOM). But what does "string" type mean then? How do you represent string payloads from sensors that are UTF16BE or UTF16LE?
In section 5.4, contentType is a Form. That looks like a bug (or else needs explaining). BTW there should be links in that table.
In the example in Appendix B, some "response" elements contain contentType as an object (a DataSchema as it looks). But that contradicts the definition of ExpectedResponse which expects a string, so this is a bug.

Overall I have the impression the handling of contentType has changed with time in the TD spec and now we have inconsistencies because of that.

egekorkan commented 4 years ago

About points 3 and 4:

This section means that in the context of Form class, contentType has the default value of application/json. However, this is not so very intuitive I agree :)
The appendix B is not an example but a JSON Schema to validate TDs. I am guessing that you are meaning the following:
```
"response":{
"type": "object",
"properties": {
"contentType": {
  "type": "string"
}
}
}
```
This means that the response is an object that contains the contentType key which is a string like application/json. The TD spec says that the key result found in a Form is of class ExpectedResponse which has the contentType. I think that the validation schema does match what the spec says.

A small point about point 1. A rewording to say that DataSchema can be used to bring additional constraints on the payload would fix it.

zolkis commented 4 years ago

Thanks for clarifying 3 and 4. The table in 5.4 needs a bit of work because now it's really confusing.

A small point about point 1. A rewording to say that DataSchema can be used to bring additional constraints on the payload would fix it.

Best would be to explain in more details, like contentType is used for describing the payload, but type can add more constraints (also in the case the contentType is not specified, i.e. falls to the default value).

However, DataSchema clearly lacks a possibility to represent binary data (unless encoded in base64), which might confuse TD writers enough to represent e.g. binary streams as series of integers (meaning a separate read for each byte) which would be crazy inefficient (though there are such example TDs), instead of just specifying contentType as e.g. "application/octet-stream" and leave type unspecified. It has to be clear in the spec how the use case is supposed to be handled.

In general, data representation needs to be rethought also with broader binary streaming use cases as well (including sampling rate, reporting rate, buffer size, flow control, data handling policy like circular buffering etc).

takuki commented 4 years ago

Regarding 1, as an example, in the binding templates document, XML binding is specified in Data Schema section.

I think whether it is a binary representation or a text representation, a content type has a corresponding specification describing the format, and the binding templates for the content type needs to explain how to map DataSchema types to payloads serialized in the content type.

zolkis commented 4 years ago

whether it is a binary representation or a text representation, a content type has a corresponding specification describing the format, and the binding templates for the content type needs to explain how to map DataSchema types to payloads serialized in the content type.

This should be in the TD spec (perhaps with more detail). The relationship and policies need to be explained in the TD spec so that it could be used for instance in Scripting. Also, examples are needed in the TD spec, especially for streaming, as the others are more clear.

The algorithm I could figure out is something like this:

if contentType is specified and DataSchema is not, then use contentType
if DataSchema is specified and contentType is not, use DataSchema
if both are specified, that is described by Binding Templates? e.g.
- if contentType is application/json, then DataSchema defines the payload?
- if contentType is application/octet-stream, then DataSchema is not used?
- what others?

egekorkan commented 4 years ago

First of all, contentType is mandatory or it will be filled with application/json when not present. Regarding the algorithm, I would more think it like:

contentType says that I will get amazingType.
I look for DataSchema elements
If there are DataSchema elements, these constrain the payload I will get for amazingType, no matter what my amazingType is.

Our DataSchema model is only based on JSON Schema and it is not JSON Schema that is strictly used for validating JSON payloads. It can be used without adaptation to describe JSON payloads which is what confuses people, i.e. that it is for JSON payloads only. For other contentTypes, we as the working group need to define how JSON Schema can be used to describe the data.

Just as food for thought, when an interaction gets an image to display on a screen through a POST request, for contentType equal to image/jpeg, if there is a DataSchema like:

"type":"object",
"properties":{
  "width":{ "const":800 },
  "height":{ "const":600 }
}

we can say that this is used to describe the image size. However, this would be something we have to fix and I can totally understand that it looks weird for this case. For XML, as explained in Binding Templates, it makes more sense. For text/plain it would be also straightforward where we can describe the payload via enum where

{
  "type":"string",
  "enum":["one","two"]
}

would mean that the payload would be one or two. Pay attention to the fact that these payloads are not JSON since they don't have the quotes.

takuki commented 4 years ago

I added a paragraph in section 5.3.2 Data Schema Vocabulary Definitions to describe the relationship between content types and data schemas. See here.

danielpeintner commented 4 years ago

See here.

Thanks @takuki. Makes it much clearer! However, I struggle a bit with the last sentence in the commit that reads as follows.

"If the content type in an instance of the Form is not application/json and no mapping is defined for the content type, specifying a data schema does not make sense for the content type. "

Shouldn't it be connected with "or" instead of "and" as follows?

"If the content type in an instance of the Form is not application/json or no mapping is defined for the content type, specifying a data schema does not make sense for the content type. "

zolkis commented 4 years ago

Thanks @takuki for clarifying the basic mechanism: contentType describes the media type and DataSchema describes additional constraints in general. However, when contentType is application/json (the default), then DataSchema can (or should?) be used for validation.

I think we should add more examples, e.g. an informal table of media types that can be further specified by DataSchema, and ones that don't (if any).

For instance the example by Ege tells that even if a media type is application/octet-stream it would be possible to define a DataSchema which limits maximum length for instance. Is the vocabulary used in the DataSchema specific to the application? or the content type? (in the latter case, the WG needs to standardize it).

zolkis commented 4 years ago

On the Scripting call we discussed with @danielpeintner and he created a table with examples. Feel free to edit. When good enough, feel free to include it in the TD spec.

contentType	dataSchema	as constraint	validation	recommended	Note
application/json	OK	OK	OK	YES	validation, …
application/cbor	OK	OK	OK	YES	JSON
application/xml	OK	OK	OK	YES	due to Binding Templates mapping (validation)
application/exi	OK	OK	OK	YES	XML
application/text	OK	OK	??? Regex	NO	metadata (length, encoding, language, enum, ..)
image/jpeg	OK	OK	NO	NO	metadata (max size, resolution, ...)
audio/mpeg	OK	OK	NO	NO	sampling rate, ...
video/mp4	OK	OK	NO	NO	bitrate, ...
application/octet-stream	OK	OK	NO	NO	app-specific metadata

takuki commented 4 years ago

I am still not clear how data schema would make sense for content types such as JPEG and MP4 videos.

For one, in the example of maximum length, I tend to say it should belong to somewhere else other than data schema because it is a constraint on the physical representation level.

zolkis commented 4 years ago

The table says that for image and video using DataSchema is not recommended, but if it's there then I guess it could act as metadata (for instance resolution, size, bitrate etc, to be known before fetching). This metadata could use standard or application-specific vocabulary. TD consumers could decide then which interaction to make.

We also discussed what interactions would make sense for serving images with multiple resolutions and sizes, and videos of various bitrates and resolution etc. Looks like Actions that take these parameters would be the best way to model. Their parameters would have a DataSchema. But it's not excluded to serve images as Properties that also have optional DataSchema, used for the image attributes.

takuki commented 4 years ago

I believe this issue was addressed in PR #847 in large part.

zolkis commented 4 years ago

Thank you @takuki. So it seems your recommendation is to ignore DataSchema if MIME type is not application/json or one for which Bindings specify a DataSchema.

Actually the Bindings spec is quite open-ended on this. Somewhere it needs to be defined more exactly which IANA media types map where. Since the TD spec is normative, it should at least list the content types for which DataSchema makes sense (and which are - supposed to be - specified more exactly in Bindings), for instance OCF, SenML (exi, cbor, json, xml), LWM2M (json, tlv), JSON, (LD-JSON?), XML.

So I think it would make sense to include some more info and/or examples in the TD doc about how to deal with various MIME types, like (part of) the table above, in order to illustrate what is the playroom covered by Bindings. Should we create a new (experimental) PR for that?

sebastiankb commented 4 years ago

@zolkis sorry for the late response. If you can provide a PR that we can discuss would be great (maybe you can provide one during the TD call and we will discuss this topic later in the call).

From my point of view, we can simply mention which encodings are directly aligned with the data schema approach or whether a mapping exists so far, e.g., for application/json, application/cbor application/xml, application/exi. If no mapping exists, data schema can be used to hint the structure or the value type of the exchanged payload (e.g., useful for plain/text).

I'm not sure about the idea to use the data schema to provide kind of metadata (e.g., max size) such as for JPEG. In my view the semantic extension feature should be used to provide such kind of metadata.

egekorkan commented 2 years ago

After resolving https://github.com/w3c/wot-binding-templates/issues/141 we should have a proper solution

sebastiankb commented 1 year ago

from today's TD call, decided to close this issue

w3c / wot-thing-description

Clarify InteractionAffordance::contentType vs DataSchema::type #839