Data types for interaction models for things

Introduction

In Dusseldorf we agreed to support the JSON Schema types by modelling them as Linked Data. The motivation is to make use of Linked Data as the underlying framework for describing things in terms of their interaction models, data types and semantics. This further gives us the freedom to annotate datatypes, e.g. to link to the semantic context and to add information for the units of measure, estimated measurement precision, and so forth.

The types are defined in JSON Schema Validation: A Vocabulary for Structural Validation of JSON, together with JSON Schema Core. These specification are pretty extensive, so I would advocate us starting with a subset based upon the requirements derived from IoT use cases, especially those for OCF, oneM2M and ECHONET Lite. This issue reviews the JSON Schema data types and their relevance to the web of things use cases. The examples use JSON Schema, and resembles the proposal for a plain JSON serialisation that maps JSON to Linked Data with a simple algorithm.

Note: The term "properties" as used in JSON Schema (as in the examples below) refers to the set of named values for JSON objects, and not to "properties" as in properties, actions and events for the interaction model for things in the Web ofThings.

:information_source: For the Linked Data vocabulary, we need a predicate that can be used to link to the data type. For instance, the subject could a thing property, and the object an RDF node denoting a core data type like a boolean. Is there an existing RDF node for this? If not we should define one in the TD namespace, i.e. https://www.w3.org/ns/td#type. Note the distinction between the data type and the semantic type, e.g. a number versus a temperature measurement.

Core types

This include null, boolean, string and number.

It is unclear how to model null in RDF, but in practice we can omit it as in most cases null signifies the absence of a value. This is appropriate for events and actions that don't carry any data.
Boolean properties are a common occurrence in IoT use cases, e.g. the Button Switch property (oic.r.button) in OCF.

Example:

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "PowerSwitch",
   "description": "An on/off power switch",
   "type": "object",
   "properties": {
      "on": {
         "type":"boolean"
      }
   }
}

:information_source: For the Linked Data vocabulary, what predicate should be used for booleans? xsd:boolean seems like reasonable choice, but we may want to define td:boolean for consistency with our other types, see numbers below.

String values are useful for open ended values. An example is propMaterial for the oneM2M Battery interface which is used to describe the battery material, e.g. lithium ion, nickel or lead. JSON Schema allows you to set constraints on the minimum and maximum length of strings. It also allows you to constrain string values to match regular expressions according to the ECMA 262 regular expression dialect. Unlike RDF, JSON Schema doesn't include support for declaring the human language used for a string literal. A work around is to use a separate JSON object property to declare the language tag in accordance with BCP47: Tags for Identifying Languages.

Example:

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "Manufacturer",
   "description": "The name of the device's manufacturer",
   "type": "object",
   "properties": {
      "manufacturer": {
         "type":"string"
       }
   }
}

:information_source: xsd:string seems like a reasonable choice for the string data type as does xsd:pattern for regular expression constraints. We could further consider supporting xsd:minLength and xsd:maxLength for constraints on the length of a string in characters.

Integers are useful for many IoT use cases. One example is the OCF brightness level (oic.r.light.brightness). By contrast, the OCF temperature (oic.r.temperature) uses a floating point number. JSON doesn't distinguish integers from numbers, but I believe that integers are sufficiently common in IoT use cases that it justifies streamlining their use rather than having to use a number plus a separate annotation to the effect that the number is restricted to integers. JSON Schema provides an extensive set of constraints on numbers. A review of the OCF and oneM2M specifications suggests that a minimal set would be minimum inclusive and maximum inclusive constraints for numeric values.

Example:

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "Humidity",
   "description": "Humidity as a percentage",
   "type": "object",
   "properties": {
      "humidity": {
         "type":"number",
         "minimum":0,
         "maximum":100
       }
   }
}

:information_source: For the Linked Data vocabulary, what predicates should be used for numbers and integers? XML Schema data types lacks a generic number type, although there is float and integer. However xsd:float is limited to IEEE single-precision 32-bit floating point type which is too restrictive. For this reason, we may want to define td:number and td:integer. We could then model JSON Schema's minimum and maximum constraints as td:minimum and td:maximum.

Enumerations

These are used when you want to pass a value that is constrained to be one of a known set of values. An example is the OCF door state (oic.r.door) whose value is limited to Open or Closed. JSON Schema provides support for this with the anyOf keyword.

Example:

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "Door state",
   "description": "Whether a door is open or closed",
   "type": "object",
   "properties": {
      "door": {
         "anyOf":[
            {
               "type":"string",
               "pattern":"Open"
            },
            {
               "type":"string",
               "pattern":"Closed"
            },
         ]
       }
   }
}

:information_source: How should we represent enumerations as a Linked Data data type? A simple solution is to use td:enum as the object for the td:type predicate, and to use a set of td:item predicates to link from the RDF node describing a thing property to the allowed values as RDF string literals.

JSON objects

These map names to values which may themselves be JSON objects, e.g. when a property is a tuple of different data types. OCF includes human language descriptions of devices and resources as part of the specifications.

Example:

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "OpenLevel",
   "description": "Property indicating how open a window, door, shutter, etc. is",
   "type": "object",
   "properties": {
      "openLevel": {
          "type": "integer",
          "minimum": 0,
          "maximum": 100
        },
        "increment": {
           "type":"integer"
       }
   }
}

JSON Schema allows you to nest objects for compound data structures. For the Web of things, this would allow you to define a property with sub-properties. It would further allow you to pass compound data structures as part of events, or with the requests and responses for actions. Here is an example for a weather station properties for the wind, temperature and humidity measurements. The wind property has sub-properties for the speed and direction for the wind.

Example:

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "OpenLevel",
   "description": "Property indicating how open a window, door, shutter, etc. is",
   "type": "object",
   "properties": {
      "wind": {
          "type": "object",
          "properties": {
             "speed": {
                "type": "number",
                "minimum":0,
                "maximum":150
             },
             "direction": {
                "type":"number",
                "minimum":0,
                "maximum":359                
             }
          }
        },
        "temperature": {
           "type":"number",
           "minimum":-40,
           "maximum":100
       },
        "humidity": {
           "type":"integer",
           "minimum":0,
           "maximum":100
       }
   }
}

If the nested object type is used multiple times, you can define it once and refer to it as many times as needed, see section below on named sub-schemas.

:information_source: A JSON schema object describes a set of named properties. These can be described in RDF using a predicate (td:property) that links to a blank node for each property. This node then acts as the subject for predicates that describe that property, e.g. its name, description, data type and so forth. Nested objects are easy to express through td:property predicates whose subject is the RDF node describing the given property. We need to decide whether to use td:name or rdfs:label for linking to property names. We likewise need to decide on the predicate for human language descriptions, e.g. td:description.

Collections and Vectors

JSON uses arrays for collections, and this imposes an ordering that may in fact not be significant. JSON Schema allows you to declare an array of items of the same type. Using an ordered collection for coordinates has the weakness of precluding the ability to name and describe each axis.

Vectors are arrays (ordered collections) with a name and description for each axis. This makes schemas easier to understand and maintain than when each axis is only identified by a number, and can be used by software for conversions and for user interfaces (e.g. labelling the axes when graphing data points. An example is to define a linear acceleration measurement in terms of sensor values for the x, y and z axes, and inclination and azimuth for pointing a telescope or solar panel.

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "SupportedModes",
   "description": "Collection of modes supported by a device",
   "type": "object",
    "properties": {
        "supportedModes": {
            "type": "array",
            "items": {
               "type":"string"
            }
        }
    }
}

:information_source: To describe arrays in RDF we need to describe the type of the items that make up the array. A very simple approach is to describe the item type as usual and add an annotation that it forms a collection via the td:collection predicate whose object should be either td:ordered or td:unordered.

Unions of types

JSON Schema defines anyOf, oneOf and allOf as a basis for validating data against a set of schemas. For the Web of Things, there will be situations where a value could be either a number or a string, or some other union of types.

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "SupportedModes",
   "description": "Collection of modes supported by a device",
   "type": "object",
    "properties": {
        "reading": {
            "anyOf": [
                {
                    "type": "number"
                },
                {
                    "type": "string"
                }
            ]
        }
    }
}

:information_source: One suggestion for modelling this as Linked Data is to use td:type with an RDF node td:union and then use td:item to link to each of the permitted type descriptions. We could perhaps support td:anyOf, td:oneOf and td:allOf rather than td:union, but what are the use cases for these?

Named sub-schemas

JSON Schema allows you to define a sub-schema and refer to it multiple times with $ref. This avoids the need for duplicating the same sub-schema multiple times, improving the readability and maintainability of schemas. Sub-schemas are commonly used in IoT standards like OCF and oneM2M as a basis for defining interfaces that are used as part of the definitions of multiple devices. OCF uses the term resource, whilst oneM2M uses the term module. One use case is for naming and defining a data type along with its annotations, another use case is where you want to define a combination of types, properties, actions, and events for composition as part of device definitions.

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "RGBColor",
   "description": "RGB color value",
   "definitions":{
      "level": {
         "type":"number",
         "minimum":0,
         "maximum":255
      }
   }
   "type": "object",
    "properties": {
        "red": {
            "$ref": "#/definitions/level"
            },
        "green": {
            "$ref": "#/definitions/level"
            }
        "blue": {
            "$ref": "#/definitions/level"
            }
        }
    }
}

This can also be used for nested objects, i.e. objects with properties whose values are other objects

Example:

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "definitions": {
      "address": {
         "type": "object",
         "properties": {
            "street_address": {"type": "string"},
             "city": { "type": "string" },
             "state": { "type": "string" }
         },
         "required": ["street_address", "city", "state"]
      }
   },
   "type": "object",
   "properties": {
      "billing_address": {"$ref": "#/definitions/address"},
       "shipping_address": {"$ref": "#/definitions/address"}
   }
}

Note that the Web of Things embraces abstract entities as well as physical entities. This includes Things that encapsulate cloud based services.

:information_source: Named sub-schemas can be modelled in RDF using a td:typedef predicate whose object is an RDF node that describes the sub-schema. The name can then be declared with td:name or rdfs:label, along with an optional td:description for a human language description. Referencing sub-schemas comes for free in RDF by using the RDF node for the sub-schema as the object for the td:type predicate.

Dates, times and durations

There are many use cases for dates, times and durations. One example is environmental control for rooms in a home, where the light levels and temperatures are set to different levels according to the time of day. Another example features a security camera that logs movements in its field of view, and more generally the means to include time stamps with sensor readings. Durations can be used to specify how long a transition should last, e.g. to smoothly fade the colour and brightness level for smart lighting. Dates can be used to express when a device was manufactured, last updated, and when it next needs calibration and testing. ISO 8601 provides a flexible string based notation for dates, times and durations, with support for time zones. Additional information is given in RFC 3339 Date and Time on the Internet: Timestamps. An alternative to Integers can be used for durations and for the time since an epoch (Unix timestamps).

Example:

{
   "$schema": "http://json-schema.org/draft-04/schema#",
   "title": "ManufacturedDate",
   "description": "The date a device was manufactured",
   "type": "object",
   "properties": {
      "manufactured": {
         "type":"date-time"
       }
   }
}

:information_source: We could use either td:dateTime or xsd:dateTime for this, and likewise support the range of XML Schema data and time formats, e.g. xsd:duration, xsd:time and so forth.

Things

Things can be passed by reference using a string literal containing the URI for the thing description. This is sufficient to generate the software object needed by an app to interact with that thing as its state will be initialised as part of the consumed thing object life cycle. Uses cases include a thing that acts as a controller for other things. Another use case involves discovery, where the response to a query request contains a collection of things matching the query.

:information_source: We should define td:thing to indicate that a property value denotes a thing.

@draggett this is great! A few questions:

With regards to integers:

JSON doesn't distinguish integers from numbers, but I believe that integers are sufficiently common in IoT use cases that it justifies streamlining their use rather than having to use a number plus a separate annotation to the effect that the number is restricted to integers.

This seems to imply that JSON Schema does not distinguish between numbers and integers if you need to "streamline their use rather than having to use a number plus a separate annotation", and the example in that section uses "type": "number". But I see that you do use "type": "integer" further down. Is this just explaining why you are including "type": "integer" in the subset?

Why use "anyOf" + "pattern" for enumerated lists instead of the actual "enum" keyword? (Also note, you want "pattern": "^Open$" and "pattern": "^Closed$" to restrict it to those exact words with that approach)
"date-time" is a format rather than a type. It is expressed as "type": "string", "format": "date-time". There is a proposal to expand the set of date/time-related formats to at least include plain "date" and plain "time".
I'm not sure I follow "passed by reference using a string literal containing the URI..." Is this separate from using "$ref" to reference a different schema file (or other JSON document such as the TD)?

@draggett

For the enumeration we might want to have a look at the owl:oneOf construct of OWL that allows the definition of a range of data values. https://www.w3.org/TR/owl-ref/#EnumeratedDatatype

@draggett

For unions of types we can use rdfs:range and owl:unionOf (https://www.w3.org/TR/owl-ref/#unionOf-def) to state that the values of a property are instances of one or more classes, in this case an union of classes.

Using the OWL concepts where they fit seems good to me!

I made a comparison that I presented in Düsseldorf between JSON Schema and OWL (see here). I agree with both @handrews's and @mariapoveda's remarks, what clearly stands out from the comparison.

However, although we should obviously reuse concepts from existing vocabularies such as OWL, there is still the problem that, if we take JSON Schema as is, the representation of the same information would have different structures in JSON and RDF. What looks like this in JSON Schema:

"properties": {
  "wind": { "type": "object", ... },
  "temperature": { "type":"number", ...  },
  "humidity": { "type":"integer", ... }
}

would have the following structure in JSON-LD:

"td:property": [
  {
    "td:name": "wind",
    "td:type": "td:object", ...
  }, {
    "td:name": "temperature",
    "td:type":"td:number", ... 
  }, {
    "td:name" "humidity",
    "type":"integer", ...
  }
]

I therefore wonder whether we should strictly follow JSON Schema's syntax or rather try to come up with a sort of OWL/JSON-LD syntax. @draggett, did you assume the former in your post?

Sorry for the long answer, but I think it is needed as we seem to have different mindsets.

JSON Schema describes constraints on data expressed as JSON. Thing descriptions by contrast describe an interaction model for things in a programming language agnostic way involving properties, actions, events and metadata. We need to be able to express constraints on the values of properties, the arguments that can be passed in action requests and responses, and the value passed with an event. The interaction model is formally defined as Linked Data with mappings to a variety of serialisations including JSON.

For JavaScript programmers and Web developers, a more natural way to express @vcharpenay's weather station example in JSON is as follows:

{
    "properties": {
        "wind": {
            "properties": {
                "speed": "number",
                "direction": "integer"
            }
        },
        "temperature" : "number",
        "humidity": "integer"
    }
}

where I have treated wind as a compound property consisting of the wind speed and direction. Assuming that this thing has the URL: http://example.org/weather_station, and assuming a sensible default context, then this example automatically translates to Linked Data as follows, using Turtle for the notation:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <http://www.w3.org/ns/td#> .

<http://example.org/weather_station> a td:thing ;
    td:property _:1 , _:2 , _:3 .
_:1 td:name "wind" ;
    td:property _:4 , _:5 .
_2: td:name "temperature" ;
     td:type "number" .
_:3 td:name "humidity" ;
     td:type "integer" .
_4: td:name "speed" ;
     td:type "number" .
_:5 td:name "direction" ;
     td:type "integer" .

We could argue about the naming conventions for the predicates, which is largely subjective.

The example would be improved by adding units of measure , defining the data type for humidity as a percentage, and the wind direction as degrees in the range 0 to 359. In principle, the constraints could be inherited from the semantic models for the units of measure. The default context could include common units so as to avoid the need for developers to include an explicit context for those units.

Responding to @handrews

With regards to integers: But I see that you do use "type": "integer" further down. Is this just explaining why you are including "type": "integer" in the subset?

Yes, the data type "integer" is a restriction on numbers to integer values.

Why use "anyOf" + "pattern" for enumerated lists instead of the actual "enum" keyword? (Also note, you want "pattern": "^Open$" and "pattern": "^Closed$" to restrict it to those exact words with that approach)

That is a good question. I would like to get further feedback from people working on commercial IoT applications to see where we should draw the line between simplicity and generality. Simple enumerations are a common feature of programming languages, but as far as I am aware, few programming languages allow you to declare whether the set is open or closed.

"date-time" is a format rather than a type. It is expressed as "type": "string", "format": "date-time". There is a proposal to expand the set of date/time-related formats to at least include plain "date" and plain "time".

I would expect developers to want a simple means to express the data type for points in time and for durations. The way points in time are exposed to applications could depend on the platform, but we may want to at least cater for ISO 8601 strings and the time since the standard epoch.

I'm not sure I follow "passed by reference using a string literal containing the URI..." Is this separate from using "$ref" to reference a different schema file (or other JSON document such as the TD)?

Are you referring to the wording on things as first class types? A thing can be identified by the URI for its thing description, and this allows a platform to access the description and construct a software object that acts as a proxy for the thing. JSON Schema defines constraints on JSON, but here we need to indicate that a value stands for an object with properties, actions and events.

few programming languages allow you to declare whether the set is open or closed.

Right- declaring an enum means anything outside of the enum is invalid, which is a problem when evolving the schema. I've run into this problem (assuming this is what you mean) and haven't come up with a solution :-/

we may want to at least cater for ISO 8601 strings

That is exactly what "format": "date-time" does (for date and time strings- again, there's an issue open for adding plain date and plain time).

and the time since the standard epoch.

I can't remember if we have a request for that filed, but it is a very reasonable concept. As a format, it could apply to numbers rather than strings.

but here we need to indicate that a value stands for an object with properties, actions and events.

Yeah, JSON Schema handles this with JSON Hyper-Schema by attaching a "self" link from the value, so since the thing description has declined to consider Hyper-Schema it's always going to be a bit of an awkward integration IMHO.

few programming languages allow you to declare whether the set is open or closed.

Right- declaring an enum means anything outside of the enum is invalid, which is a problem when evolving the schema. I've run into this problem (assuming this is what you mean) and haven't come up with a solution :-/

Anyway, an object like { door: "Neitheropennorclose" } would not not validate either against @draggett's schema, would it?

I'm not sure to understand why it is a problem. If one wants their schemas to be retro-compatible, one could define something like this to avoid redefining their original enum schema:

{
  "definitions": {
    "enumv1": {
      "enum": ["val1", "val2"]
    },
    "enumv2": {
      "oneOf": [
        { "$ref": "#/definitions/enumv1" }, {
          "enum": ["val3", "val4"]
        }
      ]
    }
  }
}

@draggett

We need to be able to express constraints on the values of properties, the arguments that can be passed in action requests and responses, and the value passed with an event. The interaction model is formally defined as Linked Data with mappings to a variety of serialisations including JSON.

if JSON is one of the possible serialization, it would probably make sense to provide your examples in RDF first. For the one you gave (weather station), here is an alternative Turtle representation:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix td: <http://www.w3.org/ns/td#> .

<http://example.org/weather_station_report> a td:DataSchema ;
    td:property (
        [ td:name "wind" ;
          td:property (
              [ td:name "speed" ;
                td:type td:number ]
              [ td:name "direction" ;
                td:type td:integer ]
          ) ]
        [ td:name "temperature" ;
          td:type td:number ]
        [ td:name "humidity" ;
          td:type td:integer ]
    ) .

Several elements differ from your version:

the range of td:property is rdf:List, which prevents any new property to be added to this type (or schema). To me, this is relevant if this information is to be used for validation.
I don't know what you put under the class td:thing* but in the vocabulary we already have, the schema is supposed to be under td:inputData and td:outputData. It should be therefore of type td:DataSchema.
number and integer are fixed terms, they should therefore not be plain strings but RDF concepts.

(*) it is important to follow RDF conventions to avoid confusion: if td:thing is a class, then the name should start with a capital letter. Besides, this class is already defined in td.ttl as td:Thing.

@vcharpenay wrote:

if JSON is one of the possible serialisation, it would probably make sense to provide your examples in RDF first.

Well, I gave both the JSON and Turtle serialisations.

The object interface for exposing interaction models to applications isn't dependent upon the choice of serialisation given that you can map them to Linked Data and from there to the object interface.

To avoid biasing the readers, I think it we should use Turtle for examples of each vocabulary item in the Working Group thing description vocabulary specification, and then provide JSON-LD examples in a later section on serialising to JSON-LD. An important question is whether to make the default JSON-LD context part of the specification text or to make a normative link to a file on the W3C server. This latter approach would make it easier for us to update the default context after the Linked Data vocabulary becomes a W3C Recommendation.

@handrews

Right- declaring an enum means anything outside of the enum is invalid, which is a problem when evolving the schema. I've run into this problem (assuming this is what you mean) and haven't come up with a solution

It is common when writing specs to lock down what is permitted so that you can later extend the spec and be assured that if something confirmed to an earlier version it will also conform to a later version. So enums are helpful in that regard.

Scalability for the Web of Things necessitates addressing versioning of one form or other as new models of devices add richer capabilities. You will want to know if a given application is compatible with a given version of a thing exposed by a device. This also motivates exposing a given device with multiple different versions of things, so that the device can be safely used by older software as well as newer software. Linked Data will allow us to reference different versions of a thing description and make statements about their relationships as a basis for compatibility.

Well, I gave both the JSON and Turtle serialisations.

I meant the eleven examples you gave in your initial comment. Could you try to translate them into Turtle ? And maybe put them in a separate file (in this repo or in the IG repo)?

@vcharpenay I can't believe I never thought of the "oneOf" + "enum" solution. That is awesome, thanks!

@draggett when I mentioned "evolving the schema" what I meant was handling versioning by versioning the schema rather than the endpoint. This, of course, does not address versioning the entire thing description (or versioning any complete API that may have multiple implementations).

Which is not to say that the thing description should do that, I just want to clarify that I get the locking down requirement and just forgot to be clear about what "evolution" meant and that it's not something I think is necessarily relevant to the thing description format.

@vcharpenay Can this be closed? Maybe once you can point to the DataSchema documentation placeholder at https://w3c.github.io/wot-thing-description/#dataschema and more discussed in the Binding Templates.

the latest discussion about data types in TD should be continued here #107

w3c / wot-thing-description