Currently return type for most functions in EventTransformer is a String, which represents enriched event turned into a JSON object. While this is very unopionated, minimalistic and exactly what most Spark users need - it usually involves a lot of post-processing, e.g:
We always know that result is a JSON object, but in order to pull something out we need to do parse(result) and even parse(result).asInstanceOf[JObject]
Multiple fields in enriched event are required and thus must be there by definition, but we need to access them carefully: parsedJson.map("event_id").getOrElse(throw new RuntimeException("event_id is not present in enriched event")
To get list of shredded types we have a separate function jsonifyWithInventory.
I think all this information should be in one typesafe container with asString helper function, something like:
case class EnrichedEvent(json: JObject, shreddedTypes: Set[IgluUri], collectorTstamp: Instant, eventId: UUID) {
lazy val asString: String = compact(json)
}
Currently return type for most functions in
EventTransformer
is aString
, which represents enriched event turned into a JSON object. While this is very unopionated, minimalistic and exactly what most Spark users need - it usually involves a lot of post-processing, e.g:parse(result)
and evenparse(result).asInstanceOf[JObject]
parsedJson.map("event_id").getOrElse(throw new RuntimeException("event_id is not present in enriched event")
jsonifyWithInventory
.I think all this information should be in one typesafe container with
asString
helper function, something like:This will also give us https://github.com/snowplow/snowplow-scala-analytics-sdk/issues/43 for free as this is a matter of one more
asStringWithoutNulls
function (but with better name).