littlehorse-enterprises / littlehorse

This repository contains the code for the LittleHorse Server, Dashboard, CLI, and Java/Go/Python SDK's. Brought to you by LittleHorse Enterprises LLC
https://littlehorse.dev/
Other
66 stars 7 forks source link

Schema Support on `VariableDef` #880

Open coltmcnealy-lh opened 1 month ago

coltmcnealy-lh commented 1 month ago

Background

The VariableDef proto right now is as follows:

message VariableDef {
  // The Type of the variable.
  VariableType type = 1;

  // The name of the variable.
  string name = 2;

  // Optional default value if the variable isn't set; for example, in a ThreadRun
  // if you start a ThreadRun or WfRun without passing a variable in, then this is
  // used.
  optional VariableValue default_value = 3;
}

The only typing information we have is the VariableType which is simply an enum and contains no schema information. We are lacking the following functionality:

Proposal

I propose to introduce a new Tenant-scoped GlobalGetable object called VariableSchema. It would look like this:

message VariableSchemaId {
  string name = 1;
  int32 version = 2;
}

message VariableSchema {
  // Id of the schema
  VariableSchemaId id = 1;

  // human-readable description
  string description = 2;

  oneof schema {
    // An Open-API v3 Schema
    OpenApiV3Schema open_api = 3;

    // Protobuf schema
    ProtoBufSchema proto_schema = 4;

    // String Regex
    StringRegex string_regex_schema = 5;
  }
}

Protobuf already has a well-defined "Proto Descriptor" API for sharing protobuf schemas. For OpenAPIv3, we can use the ApiCurio Data Models Library. Lastly, String regexes should be easy enough to match using a Pattern.

Discussion

The discussion section takes it for granted that LittleHorse should adopt some form of schema management solution. Instead we focus on the specific implementation proposed above.

Benefits

The two most crucial benefits of what is proposed above are:

  1. No additional external dependencies are introduced to the server.
  2. Because the schemas can be managed inside the global KTable, there is no performance penalty (compared with using an external Schema Registry in which we would have to make external network calls).

Further benefits are that:

Drawbacks

The first concern isn't a huge problem given the quality of engineers that work on LittleHorse. For the second concern, vendors or community members could write adapters that keep the LittleHorse VariableSchema objects in sync with an external Schema Registry.

Alternatives

An alternative is to have a hard integration with an external Schema Registry such as Confluent Schema Registry or ApiCurio.

This would be nice because:

However, there are some drawbacks: