Closed zolkis closed 1 year ago
About points 3 and 4:
Form
class, contentType
has the default value of application/json
. However, this is not so very intuitive I agree :) "response":{
"type": "object",
"properties": {
"contentType": {
"type": "string"
}
}
}
This means that the response
is an object that contains the contentType
key which is a string like application/json
. The TD spec says that the key result
found in a Form is of class ExpectedResponse which has the contentType
. I think that the validation schema does match what the spec says.
A small point about point 1. A rewording to say that DataSchema can be used to bring additional constraints on the payload would fix it.
Thanks for clarifying 3 and 4. The table in 5.4 needs a bit of work because now it's really confusing.
A small point about point 1. A rewording to say that DataSchema can be used to bring additional constraints on the payload would fix it.
Best would be to explain in more details, like contentType
is used for describing the payload, but type
can add more constraints (also in the case the contentType
is not specified, i.e. falls to the default value).
However, DataSchema clearly lacks a possibility to represent binary data (unless encoded in base64), which might confuse TD writers enough to represent e.g. binary streams as series of integers (meaning a separate read for each byte) which would be crazy inefficient (though there are such example TDs), instead of just specifying contentType
as e.g. "application/octet-stream"
and leave type
unspecified. It has to be clear in the spec how the use case is supposed to be handled.
In general, data representation needs to be rethought also with broader binary streaming use cases as well (including sampling rate, reporting rate, buffer size, flow control, data handling policy like circular buffering etc).
Regarding 1, as an example, in the binding templates document, XML binding is specified in Data Schema section.
I think whether it is a binary representation or a text representation, a content type has a corresponding specification describing the format, and the binding templates for the content type needs to explain how to map DataSchema types to payloads serialized in the content type.
whether it is a binary representation or a text representation, a content type has a corresponding specification describing the format, and the binding templates for the content type needs to explain how to map DataSchema types to payloads serialized in the content type.
This should be in the TD spec (perhaps with more detail). The relationship and policies need to be explained in the TD spec so that it could be used for instance in Scripting. Also, examples are needed in the TD spec, especially for streaming, as the others are more clear.
The algorithm I could figure out is something like this:
contentType
is specified and DataSchema is not, then use contentType
contentType
is not, use DataSchemacontentType
is application/json
, then DataSchema defines the payload? contentType
is application/octet-stream
, then DataSchema is not used?First of all, contentType
is mandatory or it will be filled with application/json
when not present. Regarding the algorithm, I would more think it like:
contentType
says that I will get amazingType.Our DataSchema model is only based on JSON Schema and it is not JSON Schema that is strictly used for validating JSON payloads. It can be used without adaptation to describe JSON payloads which is what confuses people, i.e. that it is for JSON payloads only.
For other contentType
s, we as the working group need to define how JSON Schema can be used to describe the data.
Just as food for thought, when an interaction gets an image to display on a screen through a POST request, for contentType
equal to image/jpeg
, if there is a DataSchema like:
"type":"object",
"properties":{
"width":{ "const":800 },
"height":{ "const":600 }
}
we can say that this is used to describe the image size. However, this would be something we have to fix and I can totally understand that it looks weird for this case. For XML, as explained in Binding Templates, it makes more sense.
For text/plain
it would be also straightforward where we can describe the payload via enum
where
{
"type":"string",
"enum":["one","two"]
}
would mean that the payload would be one
or two
. Pay attention to the fact that these payloads are not JSON since they don't have the quotes.
I added a paragraph in section 5.3.2 Data Schema Vocabulary Definitions to describe the relationship between content types and data schemas. See here.
See here.
Thanks @takuki. Makes it much clearer! However, I struggle a bit with the last sentence in the commit that reads as follows.
"If the content type in an instance of the Form is not application/json and no mapping is defined for the content type, specifying a data schema does not make sense for the content type. "
Shouldn't it be connected with "or" instead of "and" as follows?
"If the content type in an instance of the Form is not application/json or no mapping is defined for the content type, specifying a data schema does not make sense for the content type. "
Thanks @takuki for clarifying the basic mechanism: contentType
describes the media type and DataSchema describes additional constraints in general. However, when contentType
is application/json
(the default), then DataSchema can (or should?) be used for validation.
I think we should add more examples, e.g. an informal table of media types that can be further specified by DataSchema, and ones that don't (if any).
For instance the example by Ege tells that even if a media type is application/octet-stream
it would be possible to define a DataSchema which limits maximum length for instance. Is the vocabulary used in the DataSchema specific to the application? or the content type? (in the latter case, the WG needs to standardize it).
On the Scripting call we discussed with @danielpeintner and he created a table with examples. Feel free to edit. When good enough, feel free to include it in the TD spec.
contentType | dataSchema | as constraint | validation | recommended | Note |
---|---|---|---|---|---|
application/json | OK | OK | OK | YES | validation, … |
application/cbor | OK | OK | OK | YES | JSON |
application/xml | OK | OK | OK | YES | due to Binding Templates mapping (validation) |
application/exi | OK | OK | OK | YES | XML |
application/text | OK | OK | ??? Regex | NO | metadata (length, encoding, language, enum, ..) |
image/jpeg | OK | OK | NO | NO | metadata (max size, resolution, ...) |
audio/mpeg | OK | OK | NO | NO | sampling rate, ... |
video/mp4 | OK | OK | NO | NO | bitrate, ... |
application/octet-stream | OK | OK | NO | NO | app-specific metadata |
I am still not clear how data schema would make sense for content types such as JPEG and MP4 videos.
For one, in the example of maximum length, I tend to say it should belong to somewhere else other than data schema because it is a constraint on the physical representation level.
The table says that for image and video using DataSchema is not recommended, but if it's there then I guess it could act as metadata (for instance resolution, size, bitrate etc, to be known before fetching). This metadata could use standard or application-specific vocabulary. TD consumers could decide then which interaction to make.
We also discussed what interactions would make sense for serving images with multiple resolutions and sizes, and videos of various bitrates and resolution etc. Looks like Actions that take these parameters would be the best way to model. Their parameters would have a DataSchema. But it's not excluded to serve images as Properties that also have optional DataSchema, used for the image attributes.
I believe this issue was addressed in PR #847 in large part.
Thank you @takuki. So it seems your recommendation is to ignore DataSchema if MIME type is not application/json
or one for which Bindings specify a DataSchema.
Actually the Bindings spec is quite open-ended on this. Somewhere it needs to be defined more exactly which IANA media types map where. Since the TD spec is normative, it should at least list the content types for which DataSchema makes sense (and which are - supposed to be - specified more exactly in Bindings), for instance OCF, SenML (exi, cbor, json, xml), LWM2M (json, tlv), JSON, (LD-JSON?), XML.
So I think it would make sense to include some more info and/or examples in the TD doc about how to deal with various MIME types, like (part of) the table above, in order to illustrate what is the playroom covered by Bindings. Should we create a new (experimental) PR for that?
@zolkis sorry for the late response. If you can provide a PR that we can discuss would be great (maybe you can provide one during the TD call and we will discuss this topic later in the call).
From my point of view, we can simply mention which encodings are directly aligned with the data schema approach or whether a mapping exists so far, e.g., for application/json, application/cbor application/xml, application/exi. If no mapping exists, data schema can be used to hint the structure or the value type of the exchanged payload (e.g., useful for plain/text).
I'm not sure about the idea to use the data schema to provide kind of metadata (e.g., max size) such as for JPEG. In my view the semantic extension feature should be used to provide such kind of metadata.
After resolving https://github.com/w3c/wot-binding-templates/issues/141 we should have a proper solution
from today's TD call, decided to close this issue
Coming from Scripting issue 198, should we clarify the following:
The TD spec should contain normative prose on how to represent binary stream data. What is the
DataSchema::type
of a Property when it contains a form withcontentType
"image/jpeg"
for instance? We need normative prose on howDataSchema::type
andInteractionAffordance::contentType
should be used together, or separately.The TD doc can be serialized only as UTF-8 (without BOM). But what does "string" type mean then? How do you represent string payloads from sensors that are UTF16BE or UTF16LE?
In section 5.4,
contentType
is aForm
. That looks like a bug (or else needs explaining). BTW there should be links in that table.In the example in Appendix B, some
"response"
elements containcontentType
as an object (aDataSchema
as it looks). But that contradicts the definition ofExpectedResponse
which expects a string, so this is a bug.Overall I have the impression the handling of
contentType
has changed with time in the TD spec and now we have inconsistencies because of that.