snowplow / snowplow-badrows

Apache License 2.0
7 stars 2 forks source link

Deserializer for self-describing bad row #84

Closed istreeter closed 10 months ago

istreeter commented 10 months ago

Currently our bad row deserializer works by attempting to deserialize into each flavour of bad row, one by one, and then picks the first success.

This is a problem because the circe decoder is blind to which flavour of Bad Row was the intended target. I have seen examples where some json for a TrackerProtocolViolations was mistakenly decoded into the AdapterFailures case class. Presumably this was possible because the structure of the JSON is similar, and there are edge cases where json was technically valid for either type.

In snowplow pipelines we always pass around bad rows wrapped up as self-describing JSON. Therefore we should have a deserializer for self-describing bad rows which can intelligently pick the appropriate deserializer.