snowplow / snowplow-scala-analytics-sdk

Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.
https://snowplow.github.io/snowplow-scala-analytics-sdk/
20 stars 14 forks source link

Disable validation of field lengths when parsing event #127

Closed istreeter closed 1 year ago

istreeter commented 1 year ago

This is about reversing the change we made in #115

In sdk version 3.0.0 we changed the Event parsers so that a parsing would fail if the event's fields exceeded the maximum lengths allowed by the atomic event schema. The change was important at the time, because we could not rely on Enrich to validate the atomic field lengths -- that functionality was only added in Enrich 3.0.0. Meanwhile, it was important for Snowplow's loaders to have guarantees that the events it received would not exceed the max lengths on the warehouse tables.

But now, 1 year later, it does not seem like the right design for the analytics sdk to validate these fields. It is better for validation to happen in one place only, and that should be Enrich. In almost all scenarios, we can trust that events emitted by Enrich conform to the atomic field lengths. The only exceptions are if enrich is running with the featureFlags.acceptInvalid config option, or if the analytics SDK is processing historic data produced by an older version of Enrich.

We should keep the validation code in the analytics sdk, because it is still needed by some loaders under some circumstances. But validation should be off by default.