snowplow-incubator / sauna

:hotsprings: A decisioning and response platform
https://github.com/snowplow/sauna/wiki
69 stars 11 forks source link

Command schema: design #54

Closed alexanderdean closed 7 years ago

alexanderdean commented 8 years ago

In Snowplow we have a raw event model (Snowplow Tracker Protocol) and an enriched event model.

For Sauna, when we have a queue/stream-based responder, such as Kinesis or Kafka, we need some kind of schema for the records ("commands") which are going to be written and read.

One suggestion is something like this:

{
  "schema": "iglu:com.snowplowanalytics.sauna.commands/envelope/xxx/1-0-0",
  "data": {
    "commandId": "9dadfc92-9311-43c7-9cee-61ab590a6e81",
    "whenCreated": 1473812791000,
    "executeRule": "AT_LEAST_ONCE" / "AT_MOST_ONCE",
    "command": {
      "schema": "iglu:com.snowplowanalytics.sauna.commands/send_email/xxx/1-0-0",
      "data": {
        ..
      }
   }
}

Or maybe they are peers:

{
  "envelope": {
    "schema": "iglu:com.snowplowanalytics.sauna.commands/envelope/xxx/1-0-0",
    "data": {
      "commandId": "9dadfc92-9311-43c7-9cee-61ab590a6e81",
      "whenCreated": 1473812791000,
      "executeRule": "AT_LEAST_ONCE" / "AT_MOST_ONCE",
    }
  },
  "command": {
    "schema": "iglu:com.snowplowanalytics.sauna.commands/send_email/xxx/1-0-0",
    "data": {
      ..
    }
  }
}

Still to be decided if xxx should be jsonschema or avro.

Avro

Because of the way that Avro works, the above would be difficult. There isn't a formal way in Avro for the child command entity to be properly typed within the command_envelope, unless the command_envelope defines all possible commands as a child union type.

We could formally define all possible commands as a child union type of the command_envelope. This has first-class support in Avro, but it is inflexible: we would need to know all the possible command types up-front.

Other solutions with Avro:

  1. Stringly typing the child command (yuck)
  2. Embedding the command_envelope inside the command, rather than the other way round (so it's just convention that every command type has a envelope property)
  3. Some sort of wrapper around first class envelope and first class command (so one is not a child of the other, both are peers in Avro)

    JSON Schema

In JSON Schema things are easier - as in Snowplow we can support heterogeneous entities, and we would just unmarshall the envelope and command pieces separately.

The other plus for JSON Schema is that it would be easier to add command support into Snowplow tracking SDKs. Because those SDKs already work heavily with JSON Schema, and because JSON Schema doesn't encourage upfront code-generation like Avro does...

Execute rules

Another idea for an execute rule is a ttl - this is so that you can put an expiry date on time-sensitive commands (e.g. abandoned shopping carts)...

alexanderdean commented 7 years ago

This has been turned into this wiki page: https://github.com/snowplow/sauna/wiki/Commands-for-analysts

alexanderdean commented 7 years ago

Closing