jcustenborder / kafka-connect-json-schema

Apache License 2.0
14 stars 14 forks source link

Schema per event with schema specified in a field #14

Open gr8routdoors opened 3 years ago

gr8routdoors commented 3 years ago

Description

This provides a hook for having schemas differ per event. It wires the existing Inline and Url location types up so that they still only build their schema once upon initialization and cache it.

However, it adds an additional Field location type that lets the user specify a field within the JSON payload that contains the ID of the schema to apply for that payload. This schema ID is used to resolve the schema per event. The default implementation I've provided resolves the schema with a templated URL that contains the variable {schema_id} which gets replaced before the URL is downloaded. Further, each schema that is looked up is cached in a Commons ReferenceMap to improve performance but not risk an OutOfMemoryError.

Example

Config

transforms=jsonSchema
transforms.jsonSchema.json.schema.location=Field
transforms.jsonSchema.json.schema.field.name=metadata.schema
transforms.jsonSchema.json.schema.field.url=http://someplace.com/schemas/{schema_id}.json

JSON


{
  "id": "some-guid",
  "metadata": {
    "schema": "my-schema-name_v2",
    "timestamp": "some-time"
  },
  "payload": "more-stuff-here"
}

## Additional Notes
I've also updated the POM so that it has enough configuration for `mvn kafka-connect:kafka-connect` to work.  I hope I got the information correct in the `owner*` plugin parameters.