Options for giving observations context - cast your votes!

Ok, so let's say you have the following observation:

{
  "madeBySensor": "sensor-123",
  "resultTime": "2020-02-18T17:24:16.094Z"
  "hasResult": {
    "value": 21.2
  },
  "location": {
    "type": "Point",
    "geometry": [-1.9, 52.5]
  }
}

Completely useless! Right? We have no idea what variable is being measured, what the units are, what "thing" the observation relates to, e.g. does it relate to a person, a vehicle, the atmosphere, a building, etc, etc. Is it useful to meteorologists, highways engineers, air pollution experts or facilities managers???

It's clear we need to add some more properties to this observation to give it some context. The properties we add will become a vital part of how we query the data. E.g. allowing us to ask questions such as:

Give me all the observations of air temperature, in degrees celsius, that are useful for meteorologists.
Give me any observations related to air quality from the ground floor of the main library in Birmingham.

N.b. queries should also allow filtering by time and space, but that is not the focus of this issue.

I'll now present a series of possible solutions to this problem. I.e. different combinations of extra properties that we can add to the observations served by our APIs. N.B. this isn't an exhaustive list, so feel free to post your own suggestions, or merge features from one option with some from another.

Option 1

Key features:

Uses a fairly specific observedProperty.
Makes use of disciplines. Aim to stick to these QUDT disciplines wherever possible.
Uses featureOfInterest to detail the feature involved. N.B. the SSN Ontology only allows one featureOfInterest per observation.

Example 1:

_(N.B. I've omitted properties such as resultTime and location for brevity)_

{
  "madeBySensor": "sensor-123",
  "hasResult": {
    "value": 21.2,
    "unit": "deg_c"
  },
  "observedProperty": "air-temperature",
  "featureOfInterest": "earth-atmosphere",
  "discipline": "Meteorology"
}

Example 2:

{
  "madeBySensor": "sensor-123",
  "hasResult": {
    "value": 21.2,
    "unit": "deg_c"
  },
  "observedProperty": "air-temperature",
  "featureOfInterest": "room-112",
  "discipline": "environment-control"
}

These examples illustrate how it now becomes far easier to differentiate that one observation is an outdoor temperature and the other is an indoor temperature.

In the JSON-LD document detailing our vocabulary we should reference any other vocabularies for which the observedProperty is equivalent, e.g. our air-temperature is equivalent to the CF Standard's definition for air_temperature. It may be we can't find any equivalents and therefore we need to provide our own description.

For this solution we'd only maintain a common vocabulary for observedProperty and discipline. The featureOfInterest is more for the individual observatories to add bespoke tags relevant to their particular deployment. I can see us being a bit limited by the SSN ontology only allowing one featureOfInterest per observation. It would be nice to have ["urban-sciences-building", "room-112"].

What case we use for the property values is also up for debate, e.g. camelCamel, kebab-case, etc. Whatever we choose we should make sure it's URL-friendly.

Here the unit should reference the qudt vocabulary (assuming we can find a match).

Should we allow just one discipline per observation?

We could swap the term discipline for theme instead. Either way it would serve the same function.

Option 2

Key features:

Uses a fairly specific observedProperty.
Uses a very broad featureOfInterest, as defined by us, for example.
Makes use of qualifiers to provide bespoke context. qualifiers is a term/class invented by/for the Urban Observatory's.

{
  "madeBySensor": "sensor-123",
  "hasResult": {
    "value": 21.2,
    "unit": "deg_c"
  },
  "observedProperty": "air-temperature",
  "featureOfInterest": "atmosphere",
  "qualifier": ["street-level"]
}

So the qualifier acts much like featureOfInterest did in option 1, except we'll allow it to be an array.

Option 3

Key features:

Uses a fairly specific observedProperty
Uses a very broad featureOfInterest, as defined by us, for example.
Makes use of a hierarchy of Platforms to provide finer detail of which features are involved.

{
  "madeBySensor": "honeywell-device-6a4-thermistor",
  "hasResult": {
    "value": 21.2,
    "unit": "deg_c"
  },
  "observedProperty": "air-temperature",
  "featureOfInterest": "infrastructure",
  "platform": ["urban-sciences-building", "second-floor", "room-112", "honeywell-device-6a4"]
}

So many of us may choose to use the concept of a Platform anyway. E.g. according to the SSN ontology a weather station would be a Platform which has multiple Sensors hosted on it. This solution extends that. This hierarchical approach can be incredibly powerful, but does add a bit of complexity to the database (see ltree's and NO-SQL tree structures)

Option 4

Key features:

Uses a VERY specific observedProperty.

This could be used in combination with any of the previous options, with the goal of being able to do without one of the other properties.

Example 1 (modifies the Option 1 example):

{
  "madeBySensor": "sensor-123",
  "hasResult": {
    "value": 21.2,
    "unit": "deg_c"
  },
  "observedProperty": "outdoor-air-temperature",
  "discipline": "Meteorology"
}

Because the observedProperty is far more specific i.e. "outdoor-air-temperature" not "air-temperature" we could perhaps get away without the featureOfInterest or the discipline.

At Birmingham I've implemented Option 3 and it seems to work pretty well, so this would be my preference. However, when the featureOfInterest's are this broad I feel like discipline is a better term.

I completely forgot to mention Deployments, which can also help add context to an observation.

inDeployment is a property of a Platform. Using Simon J's water works example, you might have a platform called aeration-tank-sensor-rig which is part of a deployment called cranfield-water-works-research.

When querying our observations we could have a querystring parameter set to inDeployment=cranfield-water-works-research, this would filter the returned observations to just those from the water works.

For full disclosure, I've picked a nice example here. Your deployment could be birmingham-weather-stations and therefore you'd probably want some extra properties for better context, e.g. {featureOfInterest: 'main-library-rooftop'} or {platform: ['main-library', 'rooftop', 'climavue-6']} or qualifier: ['roof-level'].

My preference would be for something like that:

{
  "madeBySensor": "sensor-123",
  "resultTime": "2020-02-18T17:24:16.094Z",
  "hasResult": {
    "value": 21.2
  },
  "location": {
    "type": "Point",
    "geometry": [-1.9, 52.5]
  },
  "observedProperty": "air-temperature",
  "featureOfInterest": "earth-atmosphere",
  "discipline": ["environment-control", "Meteorology", ...] // This is an array
  "qualifier": ["street-level", ...] // This is an array
}

In particular I would:

avoid hyper-specific observedProperty-s such as "outdoor-street-level-air-quality". It would be daunting and difficult to maintain some consistency across different observatories.
keep featureOfInterest as a generic, tangible "thing" as originally intended. In particular, keep it NOT an array.
Introduce qualifier as an array of tags that can be used to be as specific as needed about the observedProperty.
avoid to use of Platform as it could create some confusion with the key InPlatform of a sensor (which only contains a reference to the platform Id where the sensor is mounted).
use discipline as an array of string values. The same observable may be relevant to different disciplines.

I agree we should avoid overly specific observedProperties.
And that the feature of interest should be a generic tangable thing.
Regarding the Platform, the term I should have used is isHostedBy. Which although it describes the relationship between a sensor and a platform, could still be of value in an observation. As most observations will have been observed from a platform(s). We wouldn't need to make this compulsory.
Allowing the discipline to be an array makes sense to me, in which case should it always be served as an array, and therefore should we use the key disciplines rather than discipline?

With this approach I assume we'd keep a common dictionary of observedPropertys and disciplines, but what about featureOfInterest and qualifier? I suspect it would be a nightmare to maintain and therefore we shouldn't, but we'd need to accept that one observation might use atmosphere and another earth-atmosphere for the same featureOfInterest. Likewise indoor and inside for the same qualifier.

Firstly, many thanks Simon for putting together some options and Ettore for your thoughts.

These are my initial thoughts...

Option 1

This looks like a pretty good option to me.

maintains the intended purpose of featureOfInterest
allows categorisation through disciplines, in a linked data-esque manner
provides just enough specificity in the observedProperty

On your comment about it being nice to specify the featureOfInterest as both the building and the room, I see the graph as being the solution to this. The featureOfInterest is the room, and the room then describes its own relationship to the building. The only downside is clients have to traverse the graph to find out all of the detail, but in the era of HTTP/2 requests are cheap.

Option 2

This breaks the design pattern of JSON-LD in my mind, and loses the advantages of using vocabulary-referenced keys and values.

It uses qualifier as an array of string expressions that are really unstructured metadata and descriptions. I think we can avoid this, because the JSON-LD objects aren't sealed, you can add whatever additional properties you want. For example:

  "relativeElevation": "street"

or better yet

  "heightAboveSurface": 2.0

Option 3

I see the rationale for this, but I worry that it

doesn't use featureOfInterest in the way it was probably intended
uses an array for the platform that is ordered, but it doesn't really capture the featureOfInterest

By this I mean a temperature sensor in a room may be mounted on a specific wall or may be part of an instrument panel or whatever, but the reason it's there is to represent the room as a whole. The fact it represents the room as a whole (or a zone, whatever the rationale was) is important when looking at the data.

All that said, no problem with platforms that are an entire weather station etc. There is also the option of describing heirarchy as a nested graph rather than arrays, which I think would be more JSONy.

Option 4

Yeah let's not to do this if we can avoid it, because I don't know how we could nail down exactly how granular they should be, and we risk losing the ability to compare across sensors if there are so many observedPropery values.

Summary

My preference is option 1. This has the advantage for me of being clear that air temperature is just air temperature, but if you wanted to make sure you weren't plotting indoor and outdoor temperatures, you would compose a query that only looks at air temperatures with a featureOfInterest of earth-atmosphere (or whatever we end up using).

With regard to Ettore's comments, obviously I'm no fan of qualifier as above, but I don't object to discipline being an array if that proves useful. Array or non-array would both be fine to me. They should dereference to a full IRI for the discipline, so they will be strings against a base IRI presumably, if we all use a common set of disciplines.

I thought it might be useful to give a proper example of where having a single FeatureOfInterest can be really useful.

If I have an API that's structured around a smart building, then from the entry point there would be a few logical ways to reach the relevant sensor data:

through a collection of all the sensors in the building
through a collection of all the platforms in the building
through a collection of all the rooms and zones in the building

For the latter, the relationship beween the rooms and the observations would be done through hasFeatureOfInterest.

I've thrown up an actual demo of how this could look here, which is really just the start of me trying to give some better examples for the JSON Schema/Hyper-Schema stuff. I haven't added any schemas yet, it's all pure JSON-LD at this point. Code is here.

So useful seeing an example API and some code. Thanks for that @lukessmith.

Some thoughts below. Some of which I'm sure you're aware of, but just haven't had the time to implement.

You've used /room, would using /rooms make more sense? Does it even matter if one observatory uses a singular, and another the plural? I'd personally say we should agree on either singular or plural and stick to it throughout. This gets plenty of debate on StackOverflow.
Seems slightly strange to me that the collection members are an object rather than an array. Is this the JSON-LD way? Likewise isFeatureOfInterestOf is an object.
Should the observation IDs also include the result time? I.e. so it's a unique ID for that particular observation.
For each room, are we able to include a link to all the observations collected in that room? What's the observation that is shown? The latest? Do we need make this clear?
Regarding the CollectionMeta, can it show a link to the next page? Assuming there are more rooms.

Thank you, Luke, for the example.

Also, you totally convinced me that the use of "qualifiers" is a bad idea, as it disrupt the JSON-LD.

Just a couple of points:

Sometimes we need to use weird descriptions for the ObservedProperty. For example, in some of our traffic cameras, pedestrian traffic is qualified as "towards the city centre" or "from the city centre". I'm not too sure how one would make use of such descriptions. This is actually why I initially thought of introducing "qualifiers". Thinking about it, couldn't we add "qualifiers" to our muo vocabulary?
Shall we allow (or require) "discipline" to be an array (and maybe use the plural form "disciplines")?

Thanks Simon and Ettore. You're right in that it's a work in progress so there's more to be done and considered, but building somerthing definitely helps to surface some of the issues.

You've used /room, would using /rooms make more sense? Does it even matter if one observatory uses a singular, and another the plural? I'd personally say we should agree on either singular or plural and stick to it throughout. This gets plenty of debate on StackOverflow.

Good point. It does appear as though the internet is settling on plural as being the convention. I'm keen that we don't end up attributing any semantic value to the paths we use, as it shouldn't make any difference, but I'm happy to go with plurals for the sake of consistency.

There's probably a more fundamental question here about how we do collections: should we return a collection and a view on that collection as a single object? I obviously have in my example, but the argument against doing this would be by having an 'outer' collection, you could describe the collection, within which you have a sub-object that is the view (the ten items on the page you've requested etc.). I think useful descriptions for a collection might include the total number of items in it, or the total number of filtered items (because a filter would still be paginated), and potentially also a list of the types within the collection (which would be a more semantically useful way of saying, this collection only contains rooms, whereas I might have another that only contains features of interest, which in my case would be both rooms and zones).

Seems slightly strange to me that the collection members are an object rather than an array. Is this the JSON-LD way? Likewise isFeatureOfInterestOf is an object.

This is known as node identifier indexing in JSON-LD. In short, these two approaches are identical if they're expanded:

  "member": {
      "https://playground.dev.urbanobservatory.ac.uk/api/room/1.002": {
          "@type": [
              "FeatureOfInterest",
              "Room"
          ],
          "identifier": "1.002",
          "title": "Room 1.002"
      }
  }

  "member": [
      {
          "@id": "https://playground.dev.urbanobservatory.ac.uk/api/room/1.002",
          "@type": [
              "FeatureOfInterest",
              "Room"
          ],
          "identifier": "1.002",
          "title": "Room 1.002"
      }
  ]

My personal preference is that we should use the @container form with objects rather than arrays, purely because it makes writing the JavaScript to process it a bit more logical if you're looking for a specific ID.

Should the observation IDs also include the result time? I.e. so it's a unique ID for that particular observation.

I'm not sure, to be honest. The problem we have is that SSN/SOSA doesn't say anything about having timeseries or historic observations, it's simply not in scope. The ssn-ext ontology does have ObservationCollection types, but doesn't give any examples.

I think you're right, and we probably should probably include either the timestamp in the IRI, or some clear indication that it's the latest observation. The reason I think the latter is an important option, is because we might have some APIs that don't provide access to historic data at all, as in my current example for a USB API. We're likely at Newcastle to separate out the archival of observations from the access to observations, as part of a move towards being more SOA.

For each room, are we able to include a link to all the observations collected in that room? What's the observation that is shown? The latest? Do we need make this clear?

If you're content with that proposal above, then should we introduce a new type in our vocabulary for ObservationLatest? There's probably some other ways we could express it, but I'd rather avoid tagging them in a "latest": true style.

Regarding the CollectionMeta, can it show a link to the next page? Assuming there are more rooms.

I'm planning on extending it to use JSON Hyper-Schema for the pagination in collections. That said, there is always the option of using both a JSON-LD prev/next link and a JSON Hyper-Schema. They wouldn't conflict with each other, and obviously not all clients are going to be able to interpret a JSON Hyper-Schema document (very few, I suspect). The schema approach does have advantages for the filter options though, it provides a machine-readable way of saying "how do I filter this collection to only give me the rooms with a temperature above 21 degrees" in a way that would be quite difficult to do in JSON-LD (unless someone finishes off this bit of the Hydra standard...).

Sometimes we need to use weird descriptions for the ObservedProperty. For example, in some of our traffic cameras, pedestrian traffic is qualified as "towards the city centre" or "from the city centre". I'm not too sure how one would make use of such descriptions. This is actually why I initially thought of introducing "qualifiers". Thinking about it, couldn't we add "qualifiers" to our muo vocabulary?

I think there's a few options for how to go about this:

a "local" ObservedProperty defined in your own API (rather than the pan-UO vocabulary) that is more expressive, along the lines of VehicleCountTravellingTowardsCityCentre or whatever it may be
an additional property in the observation, say direction, so you have something like
```
{
"observedProperty": "NumberOfPedestrians",
"direction": "NE"
}
```

a procedure that describes the counting mechanism

{
"observedProperty": "NumberOfPedestrians",
"usedProcedure": {
"@id": "/procedure/computer-vision-line-crossing-counts-towards-city-centre",
"direction": "NE",
"model": "https://github.com/super-mega-computer-vision-model"
}
}

Shall we allow (or require) "discipline" to be an array (and maybe use the plural form "disciplines")?

I would support this the use of discipline as a property we associate with observations/sensors/platforms/features of interest. I would keep the key name singular, consistent with the other SSN keys, but as above we can make use plurals in the addresses/IRIs, example:

{
  "@context": {
    "uo": "https://urbanobservatory.github.io/standards/vocabulary#",
    "discipline": "uo:discipline",
    "uo-discipline": "https://urbanobservatory.github.io/standards/vocabulary/disciplines#"
  },
  "discipline": [
    "uo-discipline:Transport"
  ]
}

I'm not going to have time to extend the playground code to use JSON Schema in time for tomorrow's call, but I'm sure there's plenty we can discuss based on the above. Pull requests very much welcome if you want to tweak the code based on the above.

Thanks again for your comments.

urbanobservatory / standards