snowplow / enrich

Snowplow Enrichment jobs and library
https://snowplowanalytics.com
Other
21 stars 39 forks source link

Make schemas configurable in adapters (close #791) #792

Closed matus-tomlein closed 1 year ago

matus-tomlein commented 1 year ago

Issue #791

This PR adds configuration for adapters to change the schema URIs used for tracking events and entities. The goal is to make it possible for the user to change the schema without having to redeploy the adapter, which would for example enable them to change validation rules of properties in the schemas.

The suggested approach is quite general and enables changing the schemas in any adapter (except the CloudfrontAccessLogAdapter which tracks multiple versions of the same event schema). One configures the schemas by adding a dictionary under adapters.schemas in the config file. This dictionary first maps schema vendors to sub-dictionaries with a mapping of schema names to new schema URIs. For example:

{
  ...

  "adapters": {
    # This would make any adapter tracking an event using a schema such as `iglu:com.acme/event/1-0-0`
    # to use `iglu:com.custom/other_event/jsonschema/1-2-3` instead.
    "schemas": {
      "com.acme": {
        "event": "iglu:com.custom/other_event/jsonschema/1-2-3"
      }
    }
  }
}

To implement this, I changed the adapters to be class instead of object and accept a config: AdapterConfig that contains the schemas mapping. The adapters instead of calling SchemaKey(vendor, name, format, version) to create a schema URI, now call config.schemaKey(vendor, name, format, version).

matus-tomlein commented 1 year ago

Thanks @benjben! I was testing it by building a jar file (sbt 'project kinesis' assembly) and then running the jar with custom configuration where I had some modified schemas under adapterSchemas. Is there a different way how I should test it?

Also what is the process for merging this PR? Should I merge it to a release branch?

benjben commented 1 year ago

Hey @matus-tomlein ,

I was testing it by building a jar file (sbt 'project kinesis' assembly) and then running the jar with custom configuration where I had some modified schemas under adapterSchemas.

Great !

Is there a different way how I should test it?

I usually use the Docker image (sbt 'project kinesis' docker:publishLocal) to be as close as possible to production, but as long as you have tested the runtime that's fine!

Also what is the process for merging this PR? Should I merge it to a release branch?

We kind of follow Gitflow for most projects, including Enrich. When the PR has been approved, it can be merged to develop (after rebasing/squashing to have only one commit). It will then be included in next release.