snowplow / snowplow-scala-analytics-sdk

Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.
https://snowplow.github.io/snowplow-scala-analytics-sdk/
20 stars 14 forks source link

Add type-safe Event API #53

Closed chuwy closed 5 years ago

chuwy commented 6 years ago

Currently return type for most functions in EventTransformer is a String, which represents enriched event turned into a JSON object. While this is very unopionated, minimalistic and exactly what most Spark users need - it usually involves a lot of post-processing, e.g:

  1. We always know that result is a JSON object, but in order to pull something out we need to do parse(result) and even parse(result).asInstanceOf[JObject]
  2. Multiple fields in enriched event are required and thus must be there by definition, but we need to access them carefully: parsedJson.map("event_id").getOrElse(throw new RuntimeException("event_id is not present in enriched event")
  3. To get list of shredded types we have a separate function jsonifyWithInventory.

I think all this information should be in one typesafe container with asString helper function, something like:

case class EnrichedEvent(json: JObject, shreddedTypes: Set[IgluUri], collectorTstamp: Instant, eventId: UUID) {
  lazy val asString: String = compact(json)
}

This will also give us https://github.com/snowplow/snowplow-scala-analytics-sdk/issues/43 for free as this is a matter of one more asStringWithoutNulls function (but with better name).

chuwy commented 6 years ago

Will be very useful for Snowflake Transformer and almost vital for BigQuery Loader.

chuwy commented 6 years ago

Actually, this is vital for any Loader (including RDB Shredder) and for atomic events refactoring.

chuwy commented 6 years ago

Related: https://github.com/snowplow/iglu-central/issues/778, https://github.com/snowplow/snowplow-rdb-loader/issues/103