Introduction of schemas and a TDD approach

Following up on the discussions in #13, I obviously needed to do some more work to convince people my grand plan was a sensible one. This PR and the changes in the schema branch are a work in progress intended to do that.

I have made a start on some code, and some tools, and some examples. There is still much more to do, but I wanted to put this down so people can start asking questions.

What are the problems?

How to enforce restrictions imposed by ontologies? We're adopting ontologies such as W3C's SSN, that have stipulations on the number of relations and the type of relations that can be created using predicates. We use JSON-LD to map our properties back to the vocabularies and ontologies, but JSON-LD doesn't provide any way of enforcing the restrictions. As an example, W3C's SSN says an ssn:Sensor can only sosa:observes one sosa:ObservableProperty.

How do we enforce these restrictions? Using JSON Schema. It allows these constraints to be expressed, where restrictions on type (for above example, not an array), requirements, regular expressions, etc. can all be described.

See this example of a schema for a SensorCollection that requires all of the members to be Sensor.

How to validate an instance of a sensor, observation, platform etc.? There are two types of validation to consider: compile-time and run-time. We've talked about using TypeScript Interfaces for compile-time validation, but there are some things these can't enforce, such as regular expressions. It's also conceivable however, that we would want to do run-time validation. If someone submitted a new sensor to our system using an API (functionality we want to offer), how could we check the sensor was valid? If we're adopting test-driven development, or any form of integration tests, how do we make sure the output our APIs are producing is valid against our standards?

JSON Schema allows us to validate an instance of a sensor, platform, observation etc. at run-time, using packages such as ajv. Better yet though, we can generate TypeScript interfaces from the JSON Schema documents.

See this example of a TypeScript interface generated from this schema, using this code. See this example of some automated tests on the SensorCollection (not complete, more tests to be added).

How to describe the filtering, searching, spatial queries, and other functionality available? How will people know they can go to one of our sensor collections, and use an inDeployment__isDefined=false query parameters to fetch all the ones without a deployment.

JSON Hyper-Schema will describe these. It allows constraints to be imposed just as it does on documents, for example you can only use a __gt filter on a value that is numeric.

How to ensure our documentation always matches our code? This is a perennial problem for API developers, but we should desperately try to avoid writing the documentation manually.

We should write the JSON Schema and JSON Hyper-Schema documents, then generate the documentation automatically using the contents of these files. It should be possible to generate an OpenAPI document from an API that references these documents, and if we have an OpenAPI document, we can use tools like spectacle to generate HTML documentation like this.

The key concepts

The technologies we use are:

JSON-LD describes our data as linked data. When we use a property, it provides semantic meaning to that property, using either our own vocabularies or those of others.
JSON Schema describes the constraints on our instances, so we can validate them.
JSON Hyper-Schema describes how to interact with our APIs, using query parameters, POST bodies, etc. This information sits alongside the JSON Schema, it's all in one schema document.
OpenAPI describes the whole API, but instead of associating interactions with each type, it has all of the endpoints in a single document. The bodies and responses of requests are described using JSON Schema, so they are easily reusable :-)

Next steps

I am going to trawl through the SSN/SOSA ontology and pull out all of the restrictions it imposes (that are relevant to us), and write stub tests for them. I will then write an example instance of:

Platform collection
Sensor collection (this one is done!)
Observation collection
Platform
Sensor
Observable property
Observation
Result (of various types)

What else do we need?

urbanobservatory / standards